TR Speaking 130805

Testing Speaking for the E8 Standards

Technical Report 2012

Claudia Mewald Otmar Gassner Rainer Brock Fiona Lackenbauer Klaus Siller

Testing Speaking for the E8 Standards

Technical Report 2012

Claudia Mewald Otmar GassnerRainer BrockFiona LackenbauerKlaus Siller

Bundesinstitut fr Bildungsforschung, Innovation & Entwicklung des sterreichischen Schulwesens Alpenstrae 121 / 5020 Salzburg

www.bifie.at

Testing Speaking for the E8 Standards. Technical Report 2012.BIFIE Salzburg (Hrsg.), Salzburg 2013

Der Text sowie die Aufgabenbeispiele knnen fr Zwecke des Unterrichts in sterreichischen Schulen sowie von den Pdagogischen Hochschulen und Universitten im Bereich der Lehrer aus-, Lehrerfort- und Lehrerweiterbildung in dem fr die jeweilige Lehrveranstaltung erforderlichen Umfang von der Homepage (www.bifie.at) heruntergeladen, kopiert und ver-breitet werden. Ebenso ist die Vervielfltigung der Texte und Aufgabenbeispiele auf einem anderen Trger als Papier (z. B. im Rahmen von Power-Point Prsentationen) fr Zwecke des Unterrichts gestattet.

Autorinnen und Autoren:

Claudia Mewald Otmar Gassner Rainer Brock Fiona Lackenbauer Klaus Siller

Contents

3 1 SPEAKING TO COMMUNICATE

5 2 THEORETICAL MODELS

5 2.1 Models of communicative competence7 2.2 Communicative competence in the CEFR 8 2.2.1 Linguistic competences 10 2.2.2 Sociolinguistic competence 10 2.2.3 Pragmatic competence 11 2.3 The nature of language in unplanned speech

13 3 TEST DEvELOPMENT

14 3.1 Issues of standardisation 14 3.2 Standardising the content 15 3.2.1 Task 20 3.3. Standardising the setting

21 3.3.1 Interlocutor/Assessor characteristics 21 3.3.2 Interlocutor/Assessor training 23 3.4 The test takers 25 3.5 Standardising the construct: construct validation 27 3.5.1 The Assessment Scale 34 3.5.2 Test taker feedback

35 4 E8 SPEAKING TEST SPECIFICATIONS

35 4.1 Purpose of the test 35 4.2 Description of test takers 35 4.3 Test level 35 4.4 Test Construct 36 4.4.1. Construct Space 40 4.5. Structure of the test 40 4.6. Time allocation 41 4.7 Rubrics 41 4.8 Speaking Assessment Scale 43 4.9 Prompt samples

56 5 WASHBACK

57 BIBLIOGRAPHY 60 APPEnDIx

Abbreviations

ANC Austrian National Curriculum (sterreichischer Lehrplan) BIFIE Bundesinstitut fr Bildungsforschung, Innovation und Entwicklung des sterreichischen SchulwesensE8 BIST Bildungsstandards Lebende Fremdsprache (Englisch), 8. SchulstufeCEFR Common European Framework of Reference for Languages: Learning, Teaching, AssessmentEFL English as a foreign languageFL Foreign language(s)SZ sterreichisches Sprachen-Kompetenz-Zentrum

3Testing Speaking for the E8 Standards

1 Speaking to communicate

It is commonly acknowledged that foreign language learners as well as most stake-holders consider speaking or more comprehensively oral communication the most required and important skill to be mastered.

According to Thornbury (2009, p. iv), however, [i]t is generally accepted that know-ing a language and the ability to speak it are not synonymous. Nevertheless, the teaching of foreign languages (FL) has been practised as if knowing and speaking were the same thing for quite some time, thus being ignorant about the frequent misbelief that knowing the grammar and some vocabulary, making sentences and pronouncing them properly (ibid., p. iv) in the foreign language amounts to the ability to speak it. Therefore, Thornbury maintains, many courses and teachers still teach how to vocalise grammar rather than how to communicate effectively.

Modern FL teaching, however, supported by research and the sound judgment of its receivers, who first and foremost want to become effective FL speakers with the ability to communicate successfully, has acknowledged that the interactive nature of communication requires communicative competence. Moreover, the goal of most language learners being the ability to communicate comprehensibly, effectively, and naturally, those components of communicative competence (see p. 7) essential to achieve successful communication are at the heart of modern FL teaching and testing.

The fact that spoken language is significantly different from written text, as deter-mined by the nature of the speaking process, is comparatively new. This has eventually been made tangible by the CANCODE spoken corpus1, the Cambridge International Corpus, and modern English dictionaries, which show how English is really used, not how one is supposed to use English or how one uses it in writing2. The difference between spoken and written language features not only in its lexis but also considerably in the grammar of spoken language (Carter & McCarthy 2006, McCarthy 2006a), which is to be acknowledged in teaching as well as in testing and assessment (also see p. 31).

Consequently, teaching speaking as a skill has to consider aspects of communicative competence (see p. 7), communicative genres relevant for the target group (see p. 18), and productive strategies which FL speakers apply to communicate according to the nature of the communicative task and thereby show their available communi-cative potential. Taking into consideration that according to the Austrian National Curriculum for Foreign Languages (ANC), FL education should primarily aim at communicative competence this seems particularly crucial:

Ziel des Fremdsprachunterrichts ist die Entwicklung der kommunikativen Kompetenz in den Fertigkeitsbereichen Hren, Lesen, An Gesprchen teilnehmen, Zusammenhngend Sprechen und Schreiben.

Als bergeordnetes Lernziel in allen Fertigkeitsbereichen ist stets die Fhigkeit zur er-folgreichen Kommunikation die nicht mit fehlerfreier Kommunikation zu verwechseln ist anzustreben. (bmukk 2009c, pp.12)

The curricular priority on successful communication rather than accuracy sug-gests a fluency-oriented approach to teaching and assessing speaking (Brown 1999, 1 CANCODE = the Cambridge and Nottingham Corpus of Discourse in English2 See http://www.pearsonlongman.com/dictionaries/corpus/spoken-bnc.html and http://www.natcorp.ox.ac.uk/

4 Testing Speaking for the E8 Standards

Ebsworth 1998, Krashen & Terrell 1988, Krashen 2003, McCarthy 2006b, Richards 2008). This is also emphasized by Brock et al. (2008, p. 24), who suggest that the practice of communicative competence is even possible in large classes if teachers manage to let go of correcting and adopt the role of facilitators who enable, support, and encourage speech processes instead. Moreover, they maintain that the explicit demand for all five skills to be addressed equally intensively brings forth the obliga-tion to assess spoken interaction and oral production regularly and reliably.

Die Fertigkeitsbereiche Hren, Lesen, An Gesprchen teilnehmen, Zusammenhngend Sprechen und Schreiben sind in annhernd gleichem Ausma regelmig und mglichst integrativ zu erarbeiten und zu ben. (bmukk 2009c, p. 2)

Da aber die Erfassung der mndlichen Kompetenzen in der Gesamtbeurteilung vom Lehrplan im Sinne der Gleichwertigkeit der Fertigkeiten explizit gefordert wird, muss ein GERS-orientierter Unterricht mndliche Prfungs- und bungsformen beinhalten, die sowohl monologische als auch dialogische Sprechkompetenzen verlsslich abbilden. (Brock et al. 2008, p.12)

For this reason, the ANC and the E8 Standards (E8 BIST) describe precisely what language learners should be able to do in spoken interaction and oral production in can-do descriptors and CEFR levels. Testing speaking in the E8 BIST context thus relies on an overarching framework which takes the aspects addressed in the three documents into consideration.

In the following theoretical models of communicative competence and language ability, construct specifications, task specifications as well as assessment specifications are described.


2 Theoretical Models

As early as 1961 Lado (p. 239) suggested that the ability to speak was without doubt the most highly prized skill, while testing it was the least developed and the least practised in the field of testing. One might argue that Lados work on testing is histo-ry and modern FL education has long overcome this mismatch. However, the current state of the art of testing and assessing speaking in Austrian classrooms suggests that testing hardly ever happens in a systematic way and thus the ability to speak does not have a strong formal impact on the learners final grades. Therefore, it seemed appropriate and necessary to explore findings from international test development and apply them to the Austrian context in the development of the E8 Speaking Test. Lado (1961) argues that the underrepresentation of testing and assessment in speaking derives from the fact that we lack understanding of what constitutes speaking. Con-sequently, this section focuses on theories of speaking in order make transparent how the concepts of speaking and communicative competence have been captured by the literature since the 1960s and finally in the E8 Speaking Test.

Modern testing of speaking draws on competence models that accept the view that speaking does not happen in a vacuum but that it is a real-life process, co-constructed between participants talking in specific contexts and situations (Fulcher 2003). Theoretical models for testing speaking which acknowledge the communicative function of that skill therefore define competence models.

2.1 Models of communicative competence

According to Johnson & Johnson (1999, p. 62) communicative competence is the knowledge which enables someone to use a language effectively and their ability to use this knowledge for communication. The term is most usually attributed to Dell Hymess paper On communicative competence (1970).

Since the 1970ies the concept of communicative competence has been discussed and redefined by many researchers and authors3. Hymess original proposition, however, usually remains the starting point of discussions of communicative competence. He suggests that learners of a foreign language have to have linguistic knowledge, dealing with producing grammatically correct sentences, and communicative competence, dealing with producing and understanding sentences that are appropriate and accept- able to a particular situation in order to be able to communicate effectively (Hymes 1972, pp. 284286). He thus emphasises the difference between knowledge about language and the competence that enables a person to communicate functionally and interactively.

In a later publication Hymes (1974, p. 62) specified the components of speech and suggested grouping them in a mnemonic code that would spell SPEAKING in order to make them memorable:

3 Some of these researchers were Bachman 1990, Bachman & Palmer 1996, Canale & Swain 1980, Fulcher 2003, Luoma 2004, Widdowson 1978, and Wilkins 1976. Their publications quoted in this paper had an impact on the development of the E8 Speaking Test.


1. Setting and Scene (the time and place of a speech act and the psychological setting or cultural definition of an occasion, e.g. range of formality or sense of seriousness)2. Participants (the speakers and the audience of a speech act)3. Ends (the purposes, goals, and expected outcomes of a speech act)4. Acts (the speech acts and speech events)5. Key (the tone, manner, or spirit of the speech act)6. Instrumentalities (the forms and styles of speech such as casual register and colloquial features, or formal register and careful grammatical standard forms)7. Norms (the social rules that govern the event and the participants actions and reaction)8. Genre (the kind of speech acts or events, i.e. the types)

(Summarised from Hymes 1974, pp. 5262)

Most of the above components of speech are discussed in this paper taking into con-sideration the content and the context of the E8 Speaking Test: Setting (see p. 22), Participants (see p. 23), Purpose (see p. 35), Speech Acts (see p. 40), Key (see p. 35ff), Instrumentalities (see task types p.16), and Genre (see p. 18).

In addition to the already mentioned researchers, several linguists and methodo-logists (Brumfit & Johnson 1998, Wilkins 1976, Widdowson 1978) took up the no tion of communicative competence in the development of communicative lan-guage teaching during the 1970s and 1980s. Just a few of them will be mentioned in the following discussion, namely those whose theoretical reflections and empirical work seem to have had the strongest impact on the theory of communicative com-petence and on the development of the E8 Speaking Test.

Like Hymes (1974), Widdowson (1978) suggests that knowing a language is more than just understanding, speaking, reading, and writing sentences. In fact, he pro-poses that knowing a language means using sentences to achieve communicative purposes. Additionally, Widdowson (1983) introduces a distinction between com-petence and capacity.

He refers to communicative competence as the knowledge of linguistic and sociolin-guistic conventions but to procedural or communicative capacity as the ability to use knowledge as a means of creating meaning in a language. In this way Widdowson has established the main concern of successful and meaningful communication, which is a predominant feature in the assessment of spoken performances in the E8 Speaking Test.

Canale & Swain (1980) see communicative competence as a synthesis of an under-lying system of (conscious or unconscious) knowledge that interacts with other sys-tems of knowledge (e.g. world knowledge) and that is observable in actual commu-nicative performance. This system of knowledge includes knowledge of grammatical principles, the use of language in social contexts to fulfil communicative functions, and the use of discourse principles (Canale & Swain 1980, p. 27).

Canale & Swain thus ground their concept of communicative competence on the following components:


Grammatical competence. This type of competence will be understood to include know-ledge of lexical items and rules of morphology, syntax, sentence-grammar semantics, and phonology.

Sociolinguistic competence. This component is made up of two sets of rules: sociocultural rules of use and rules of discourse.

Strategic competence. This component will be made up of verbal and nonverbal communi-cation strategies that may be called into action to compensate for breakdowns in commu-nication due to performance variables or to insufficient competence. Such strategies will be of two main types: those that relate primarily to grammatical competence (e.g. how to paraphrase grammatical forms that one has not mastered or cannot recall momentar-ily) and those that relate more to sociolinguistic competence (e.g. various role-playing strategies []).(Canale & Swain 1980, pp. 2930)

Bachman (1990, p. 84) pursues a similar concept as Widdowson and describes com-municative language ability (CLA) as consisting of both knowledge, or competence, and the capacity for implementing, or executing that competence in appropriate, contextualized communicative language use [our emphasis]. He therefore proposes a framework including the following components: language competence, strategic competence, and psychophysiological mechanisms (i.e. the neurological and psy-chological processes in the actual execution of language as a physical phenomenon such as sound).

According to Bachman (ibid.) language competence comprises a set of specific know-ledge components utilised in communication via language, while strategic compe-tence embraces the mental capacity for implementing the components of language competence in contextualized communicative language use. He strongly links this competence with sociocultural knowledge and real-world knowledge. In this re-spect Bachman is in agreement with Widdowson as well as Canale & Swain who also emphasise the procedural and functional notion of communicative competence with regard to contextualised and meaningful communication.

2.2 Communicative competence in the CEFR

The discussion of the importance of The user/learners competences when carrying out communicative tasks in the CEFR (Council of Europe 2001, pp. 101108) had an impact on the development of the E8 Speaking Test. The CEFR suggests that all human competences contribute in one way or the other to the ability to communi-cate and may therefore be regarded as aspects contributing to communicative com-petence. (ibid., p. 101) Nevertheless, those competences closely related to language in the description of communicative language competence are especially emphasised (ibid., pp. 108130).

According to the CEFR users/learners employ their general capacities ... together with more specifically language-related communicative competence in order to ful-fil communicative purposes. Thus, communicative competence has the following components:

linguistic competences; sociolinguistic competences; pragmatic competences. (Council of Europe 2001, p. 108)


2.2.1 Linguistic competences

In the description of linguistic competence, the CEFR refers to the main compo-nents of linguistic competence defined as knowledge of, and ability to use, the for-mal resources from which well-formed, meaningful messages may be assembled and formulated. (Council of Europe 2001, p. 109)

From the six linguistic competences mentioned in the CEFR, the following three have been selected to be assessed in the E8 Speaking Test: lexical, grammatical, and phonological competence.

Lexical competence

Lexical competence is described as the knowledge of, and the ability to use, the vo-cabulary of a language, [and] consists of lexical elements and grammatical elements. (Council of Europe 2001, p. 110)

Lexical elements include fixed expressions and single word forms.

Single word forms are single words that may have several distinct meanings (e.g. tank container/armoured vehicle), open word classes (nouns, verbs, adjectives etc.), and lexical sets (days of the week, weights and measures etc.). (Council of Europe 2001, p. 111)

Fixed expressions consist of several words that are used and learnt as wholes. In un-planned speech they are the building blocks of fluency, which demonstrate commu-nicative capacity. According to the CEFR fixed expressions include sentential formu-lae, phrasal idioms, fixed frames, and fixed phrases (Council of Europe 2001, pp. 110111).

In speaking, fixed expressions are often called lexical phrases, formulaic language, conversational routines or prefabs. They range from chunks of language to complete sentences that are not assembled word by word in the speech act but have been pre-assembled through repeated use. Therefore, they can be accessed easily and quickly and thus contribute to fluency (Thornbury 2009, p. 23).

Examples from performances recorded during the piloting phase of the E8 Speaking Test:

How are you? I dont know what you mean. Have a nice day. Good bye. (sentential formulae, also called social formulas)What I dont like is ..., Please can I have , Id like to ..., What do you think (about) ..., I hope I will ... (fixed frames or phrases, also called sentence frames)Well ..., Right ..., I agree ..., You see ..., Yeah ... (discourse markers) Three times a week, brush my teeth, go to a party/cinema/friend (collocations)

Finally, grammatical elements that belong to closed word classes range from articles to prepositions and particles (for a complete list see Council of Europe 2001, p.111). Lexical competence is assessed in the dimension Vocabulary in the E8 Speaking Test (see p. 33f ).


Grammatical competence

Grammatical competence is defined in the CEFR as the

knowledge of, and ability to use, the grammatical resources of a language. ability to understand and express meaning by recognising and producing well-formed phrases and sentences. (Summarised from Council of Europe 2001, pp.112113)

The CEFR does not provide a model for grammar or for the organisation of words into sentences but it identifies parameters [] which have been widely used in grammatical description: elements, categories, classes, structures, processes, and relations (Council of Europe 2001, p. 113).

In the E8 Speaking Test, grammatical competence is assessed in the dimension Grammar (see p. 31f ).

In the assessment of both, lexical and grammatical competence, the nature of lan-guage in unplanned speech is acknowledged (see p. 11).

Phonological competence

The CEFR describes phonological competence as the knowledge of, and skill in the perception and production of:

sound units, phonetic composition of words, sentence phonetics, and phonetic reduction.

(Summarised from Council of Europe 2001, pp.116117)

Apart from the test takers success in making use of an appropriate lexical and gram-matical range and the accuracy of the performance, the naturalness and clarity of the language used are assessed as the third component of linguistic competence in the E8 Speaking Test. A performance is considered natural and clear if the pronunciation is intelligible and the pronunciation and intonation make it sound natural. In order to achieve this, performances have to reach a certain level of fluency.

According to McCarthy, fluency is shown through

lexico-grammatical and phonological flow, apparently effortless accurate selection of elements by individual speakers, the ability of participants to converse appropriately on topics, the ability to retrieve chunks, interactive support by each speaker to the flow of talk, and helping one another to be fluent.

(Summarised from McCarthy 2006b, p.5)

In this way, McCarthy maintains, speakers are able to express ideas appropriately, coherently, speak at a suitable pace, and use pausing at expected points.

In the E8 Speaking Test, fluency features as phonological flow in the sense that natural and clear pronunciation and intonation should make it possible for speakers of English to understand the test takers utterances without having to guess on meaning.


Phonological competence as described above is assessed in the dimension Clarity and Naturalness of Speech (see p. 30f ).

Aspects of confluence, which also contribute to fluency, are assessed as discourse competence and design competence (see p.10ff) in the dimension Task Achieve-ment and Communication Skills (see p. 28f ).

2.2.2 Sociolinguistic competence

Sociolinguistic competence is described as the knowledge and skills required to deal with the social dimension of language use. [T]he matters treated ... are linguistic markers of social relations; politeness conventions; expressions of folk wisdom; regis-ter differences; and dialect and accent. (Council of Europe 2001, p. 118)

In the context of the E8 Speaking Test the focus is primarily on the linguistic aspects of sociolinguistic competence. With regard to the limitations of the testing situation, the FL level and age of the target group, as well as the relationship of interlocutors and test takers, sociolinguistic competence is restricted to linguistic markers of so-cial relations and politeness conventions.

Therefore, linguistic markers that come into play in the E8 Speaking Test are most likely greetings on arrival and leaving as well as introductions and conventions for turntaking. Politeness conventions (Council of Europe 2001, p. 119) are dependent on the task and the descriptor being tested and therefore restricted to expressing and responding to feelings such as surprise, happiness, sadness, interest and indifference, offering things or actions etc.

The use of linguistic markers of social relations and politeness conventions as evidence for sociolinguistic competence will most likely become observable as lexical elements and are thus assessed in the dimension Vocabulary, while the test takers ability to follow Conventions for turntaking is assessed in the dimension Task achievement and communication skills in the E8 Speaking Test (see p. 28f ).

Other aspects of sociocultural and sociolinguistic competence like folk wisdom, register differences or dialect and accent go beyond the level and knowledge of the target group and are therefore not considered in the assessment of the E8 Speaking Test.

2.2.3 Pragmatic competence

According to the CEFR, [p]ragmatic competence deals with the ability to organise, structure and arrange messages (discourse competence), to perform communicative functions (functional competence), and to sequence turns according to interactional or transactional schemata (design competence). (Council of Europe 2001, p.123)

Discourse competence

In agreement with the definition by Canale & Swain (1980), the CEFR defines discourse competence as the ability ... to arrange sentences in sequence so as to pro-duce coherent stretches of language. (Council of Europe 2001, p.123)

In the E8 Speaking Test this competence can best be demonstrated in the monologue part (see p. 17), where the test takers are most likely to produce text that features whole sentences. In other parts of the test (interview or dialogue, see pp. 1617) the nature of interactive talk will primarily trigger the use of short idea units and incomplete sentences, strings of short phrases, as well as short turns (see also p. 11).


Functional competence

Functional competence refers to the use of spoken discourse .... for particular func-tional purposes. (Council of Europe 2001, p. 125)

In the context of the E8 Speaking Test, functional competence comes into play as the already mentioned ability to make use of known expressions (see p. 8ff) in meaning-ful exchanges surfacing as communicative capacity or communicative strategies (see pp. 67).

According to the CEFR (Council of Europe 2001, p. 128) the qualitative aspects which determine functional success are fluency, the ability to articulate, to keep going, and to cope when one lands in a dead end and propositional precision, the ability to formulate thoughts and propositions so as to make ones meaning clear.

The aspect of fluency which describes the ability to articulate is assessed in the dimension Clarity and Naturalness of Speech, while the ability to keep going, and to cope when one lands in a dead end as well as the ability to formulate thoughts and propositions so as to make ones mind clear are assessed in the dimension Task Achievement and Communication Skills (see p. 28f ).

Design competence

Design competence describes the ability to sequence turns according to inter-actional or transactional schemata. (Council of Europe 2001, p. 123) In the context of the E8 Speaking Test this ability will surface primarily in the dialogue part (see p.17), where the possibility for turntaking provides opportunities for the effective use of language to organise the discourse (also see p. 8ff) and thus the chance to demonstrate discourse competence typical for interactive and unplanned speech.

2.3 The nature of language in unplanned speech

According to Thornbury speech production takes place in real time and is therefore essentially linear with planning time ... severely limited. Therefore, in speaking words follow words, phrases follow phrases etc. and to compensate for limited planning time ... [speakers] ... use what is called an add-on strategy. This results in chaining together short phrases and clause-like chunks, which accumulate to form an extended turn. (Thornbury 2009, pp. 2 & 4)

Similar to Thornburys description of speech production, Luoma depicts unplanned speech as

spoken spontaneously mostly in reaction to other speakers; containing short idea units, incomplete sentences, strings of short phrases, or short turns;

delivered in a formal to informal register. (Summarised from Luoma 2004, p. 12)

Although the test takers in the E8 Speaking Test are given a short time to think about their speech act (see p. 40), their performances cannot be called planned in Luomas terms. Planned speech, according to Luoma (2004, pp. 1213), is rehearsed, consists of well-thought-out points or opinions, and has been said many times before.


Therefore, the nature of language of unplanned speech is considered in the assess-ment of spoken performances in the E8 Speaking Test, especially in the assessment of the test takers linguistic competence, i.e. Vocabulary and Grammar. It is acknowledged, that both vocabulary and grammar in unplanned speech are limited in their range as well as in their accuracy compared to writing and that performance effects [which] include the use of hesitations (erm, uh ), repeats, false starts, incomplete utterances, and syntactic blends (i.e. utterances that blend two gram-matical structures as in Ive been to China in 1998 (Thornbury 2009, p. 21) are natural.


3 Test development

The following Model of speaking test performance (Figure 1) describes the components which constitute the E8 Speaking Test as well as a range of factors and processes that have impact on the performance and its assessment and have therefore been conside-red in the development of the E8 Speaking Test.

It depicts that the construct had to be related to communicative competences, task-specific knowledge and skills as well as test taker characteristics, which had to be considered in task development, decisions on setting, and how to pair the test takers. Moreover, it shows that test administration including setting, interlocutor character-istics and training has a bearing on the performance and that its assessment based on the assessment scale does so, too.

Thus it clarifies that the test performance is at the heart of an interrelated system which required organised decision making in order to provide testing and assessment tools that would most likely bring about valid and reliable results. The following sections will describe the process of test development in the light of these aspects.

Figure 1: Model of speaking test performance (adapted from Fulcher 2003, p. 115)


3.1 Issues of standardisation

In this chapter we will discuss general issues of standardisation such as reliability and validity and of test design including the rationale for the paired approach towards testing speaking. Moreover, we will touch on issues of standardisation through train-ing and the use of the standardised assessment scale for the E8 Speaking Test.

While validity deals with the appropriateness of a given test or any of its component parts as a measure of what it is purported to measure (Henning 1987, p. 89), test reliability describes [t]he actual level of agreement between the results of one test with itself or with another test. (Davies et al. 1999, p. 168)

Bachman suggests considering reliability and validity as complementary aspects to identify, estimate and control factors that affect scores:

The investigation of reliability is concerned with answering the question, How much of an individuals test performance is due to measurement error, or to factors other than the language ability we want to measure? and with minimizing the effects of these factors on test scores. Validity, on the other hand, is concerned with the question, How much of an individuals test performance is due to the language abilities we want to measure? and with maximising the effects of these abilities on test scores. (Bachman 1990, pp. 160161)

Acknowledging the importance of the issues of reliability and validity it was a major concern of the E8 testing group to identify sources of error in the assessment of the test takers communicative language ability and to develop a test and an assessment tool that would be capable of identifying the language abilities to be measured as reliably as possible.

3.2 Standardising the content

According to Kerlinger (1973, p. 458) content validity is the representativeness or sampling adequacy of the content the substance, the matter, the topics of a measuring instrument. Additionally, Cspes & Egyd (2004, p.19) maintain that speaking tests should present test takers with tasks that resemble as closely as pos-sible what people do with the language in real life and Davies et al. (1999, p. 34) argue that the test content must include an adequate sample of the target domain [spoken language] to be measured. An adequate sample involves ensuring that all major aspects are covered and in suitable proportions.

Since the E8 BIST and the ANC follow the construct of the CEFR in differentiating speaking into Oral Production and Spoken Interaction, the teaching and testing of speaking have to deal with two skills. These should receive equal attention in terms of tuition time and be weighted appropriately in proportion to the skills of reading, listening, and writing in the general assessment of learners as suggested in the ANC. As a consequence, the duality in speaking as a skill has brought about two compo-nents in the E8 Speaking Test: the monologue and the dialogue part (see p. 17ff).

The following sections will discuss issues of validity referring to the content of the E8 Speaking Test with reference to the achievement measures of oral production and spoken interaction realised in the tasks.


3.2.1 Task

The content of the tasks in the E8 Speaking Test is defined by the topics, the commu-nicative function determining the task types, the spoken text types, and the rubrics.

Topics and context

In real life, speaking occurs in a given context. Therefore, the tasks are based on topics that provide contexts as close to real life as possible and avoid such that might put some test takers at a disadvantage because the task achievement requires specific knowledge of the world and/or cultural knowledge. Moreover, topics that require a great deal of creativity or imagination to accomplish the task or that might easily trigger stereotypes are not used either.

The topics of the E8 Speaking Test follow curricular guidelines and the contexts the tasks create reflect the world knowledge and experience of 14-year-old test takers (also see p. 23). Moreover, great care is given to design tasks that are interesting for the test takers in order to support motivation and participation.

The topic and the context determine

the purpose of communication (see Speaking Purpose/Communicative Func-tion, p. 35ff),

the audience to be addressed, which defines the interactional relationship (see Primary Audience , p. 35ff),

the kind of spoken text type to be produced (see p. 18ff), and the expected content (see pp. 3640, topics from the ANC).

The constituents of the context are fleshed out in the construct space that defines the tests framework in terms of its components. The expected content, i.e. the informa-tion the test takers are expected to present, is prompted in content points, textual or visual stimuli.

Prompts are kept short to avoid validity problems through too much reading input, but no important information needed to complete the task successfully is omitted. Possible content points are clear and easy to recognise.

For example, if the prompt asks the test takers to describe the village/town where they live, the prompt could look like the following:

The prompt and the content points give away as little language as possible that will be needed to accomplish the task to give the test takers the possibility to make use of

The village/town where you live

Say what this village/town is like. what things or places are interesting to see. what you can do there. why you like this village/town. why you do not like this village/town. what you would change about this village/town.


their own ideas and language. However, making use of the language in the prompts is not prohibited and does not have negative impact on the assessment.

If drawings, graphs, or pictures are used to illustrate the prompt and/or to stimulate speaking, these are provided in excellent quality so that the test takers are not put at a disadvantage.

Input texts that are used as a part of the prompt should be authentic. If this is not possible, adapted texts must provide correct and appropriate English. Input texts must be as short as possible and they must not exceed 50 words so that reading is kept to a minimum. The language level of input texts must also be at or preferably below the tested level and therefore not exceed CEFR level A2 (see Council of Europe 2001, p. 24).

Rubrics

All rubrics that provide the instructions for the tasks are written in English (see p.41). The language used has been piloted and revised several times. It is under the candidates expected level of language competence and therefore easily understan-dable for test takers who have mastered low CEFR levels of A2.

Task types

Speaking tasks can be set in a way that the speakers are asked to produce speech events independently or collaboratively (Kahn 2008 quoted in Wong & Waring 2010). For this reason the E8 Speaking Test has been developed in a way that the test takers are given the opportunity to produce language in a monologue and a dialogue part.

The literature (Brooks 2009, Egyud & Glover 2001, Taylor 2001) discusses various aspects of the individual and paired peer-approach towards testing and performance assessment and emphasizes the advantages of the latter as being more natural and less stressful for the test takers, thus producing better and more elaborated language. Therefore, the tasks in the E8 Speaking Test have been designed in a way that the interactional relationship the test takers are engaged in is symmetrical, i.e. the test takers communicate with each other about familiar topics and the power-distance relationship between test takers and an adult interlocutor is reduced to a minimum.

The interview

At the beginning of the test an interview serves as a Warm-up with the goal to break the ice and to make the participants and the interlocutor familiar with each other. Following standardised instructions the interlocutor asks three to five inter-view questions to create a friendly conversation between the interlocutor and the two test takers, similar to the standard teacher-pupil interaction in class the test takers should be familiar with. The questions in the interview are global ones that will most likely elicit short answers; questions that require knowledge of the world, embar-rassing or ambiguous questions, or yes/no questions are not used. Typical interview questions are: Whats your name? Where do you live? What are your hobbies? When do you usually get up in the morning? What do you normally do after school? What do you normally do at the weekend?

In the interview, questions about topics that feature in the monologue or dialogue part of the prompt set are not permitted to avoid repetition and putting test takers at an advantage or disadvantage because they could repeat language used previously.


The monologue

In the monologue part each test taker is offered a choice of three topics. The three topics vary in E8 BIST descriptors and text types. Moreover, they do not provide any overlap with the second test takers topics or with the topics used in the interview or dialogue part.

Each monologue is triggered by six to eight content points that should provide a guide line for the test takers. However, they are not restricting, and it is not compul-sory to make use of or to cover all of them. That is, the test takers can also follow their own ideas in the presentation of the selected topic.

Standardised repair questions are provided for all content points and some additional ones are added. These are used by the interlocutors to support the test takers in case of breakdown of communication or lack of ideas.

The dialogue

In the dialogue part the two test takers communicate with each other. The inter-action is triggered by visual and textual cues that provide ideas for the interaction. However, these do not restrict the test takers in their freedom to make their own choices in the elaboration of the given topic.

The dialogue part consists of a short and a long dialogue because certain E8 Stan-dards descriptors suggest a short format, while others lend themselves to be used in long dialogues (see Construct Space p. 36ff).

In both formats visual stimuli and short verbal prompts (see p. 43ff) such as key words or question starters are used to trigger interaction about the topic. Additional-ly, the prompts encourage the test takers to use their own ideas.

The test takers are not bound to make use of the question starters or key words in the prompt, but successful interaction of the test takers is at the heart of E8 Stan-dards assessment. Therefore, the prompts are considered to be a thought-provoking medium, while the test takers have the freedom to carry out their own solutions. On the one hand, this gives the test takers the opportunity to make use of their linguis-tic and creative potential. On the other hand, test takers who are used to following guidelines in their interaction are offered the opportunity to make use of the stimuli offered by the prompt.

There is no standardised prompting by the interlocutor in the short dialogue because this would result in the interlocutor talking on a part in the dialogue which is not desired. Therefore, if one test taker does not communicate, the interlocutor asks the other test taker to do so. If this also fails, the long dialogue is started.

In the long dialogue the interlocutors are trained to facilitate the interaction in a standardised way without being intrusive. Contrary to the monologue, where the interlocutor asks questions or gives stimuli in cases of breakdown, the interlocutor remains silent and passes repair question slips to test takers who do not ask questions. This opens up the opportunity for one test taker to read out the question and for the other to respond. Ample piloting of repair questions has shown that this is less in-trusive than the interlocutors direct repair questions, which re-direct the interaction into an interlocutor - test taker conversation.


Text types

The communicative genres or more precisely the text types used in the E8 Speaking Test are listed in the Construct Space (see p. 36ff) in alphabetical order, as they cannot be automatically matched with any particular E8 BIST descriptor or topic. Instead, they have to be meaningfully selected in task design to match the E8 BIST descrip-tor, the topic, and the task type.

The following text types are used in the E8 Speaking Test:

Descriptions

Descriptions say what things, people, places, pets, pictures etc. are like. Mostly de-scriptions follow a typical structure: first they identify the phenomenon and then they describe it in parts, qualities, and/or characteristics. In most cases, descriptions will suggest the use of the present tense, adverbs and adjectives, or comparisons to help picture the person or object, and employ the five senses in saying how something or someone looks, sounds, feels, smells, or tastes.

Expository discourse

Expository discourse presents a topic. It does not report events or focus on a performers actions, but presents a topic in a static way. The information is logically organised around a theme e.g. Positive and negative sides of life in a big town.

Expository discourse presents a problem, some arguments, a solution, and probably an evaluation of the solution. In the context of the E8 Speaking Test, expository discourse is limited to topics that do not require concrete factual knowledge. That is, the test takers may be asked to present familiar topics such as extreme sports, healthy nutrition, the life/problems of teenagers, the environment etc. but not a specific geographical region or place, or an event in history etc.

Narratives or Stories (true or invented)

Narratives and stories are predominantly constructed in past tenses because they usually happened in the past before someone tells them. The tenses used can be simple past, past continuous tense, and past perfect tense. Narratives or stories often focus on a series of events that are mostly presented in a linear sequence.

In speaking, narratives and storytelling often use direct speech to make the listeners feel, think, and share experiences through the real dialogues of the participants. A lot of direct speech will change the nature of the language in the monologue (e.g. incomplete sentences, phrases, chunks, tense switches ).

If storytelling is triggered by pictures, present tenses can also be used.

Personal reports

Personal reports describe the features of events within the experience of the test takers (e.g. reports about holidays, weekends, sports weeks, excursions, family meetings or feasts etc.). They generally follow a similar structure (what, when, where, with whom, why, how) and use facts to explain something or give details about a topic. Moreover, they can be descriptive. Reports are mainly delivered in the past. If the reports focus on rituals in the test takers daily lives, present tenses can also be used.


Personal statements

In personal statements the test takers present themselves; they give reasons, talk about their plans and/or give explanations for them. The age and the life experience of the test takers limit the topics that can be matched with this text type to such referring to future education/job/life/ideal place to live/ideal partner or family/free time or holiday preferences etc. A personal statement will mostly feature present tense, future tense, or the conditional.

Argumentative discourse

Traditionally, argumentative discourse is a form of interaction in which the indivi-duals maintain opposing positions. In the context of the E8 Speaking Test, however, the test takers will most likely share similar opinions. Thus, argumentative discourse will trigger arguments of equal actors engaged in personal, social interaction rather than such of abstract or conflicting nature and differ from informal discussion in its more personal content.

Functional discourse

Functional discourse refers to speech acts that engage the test takers in carrying out concrete social functions such as greeting and departing, expressing feelings like surprise, joy, regret, interest etc., making arrangements or transactions in shops, post offices, getting information about travel, asking and telling the way etc.

Thus, the audience of the functional discourse would normally originate in the public domain (e.g. shop personnel, police, drivers, conductors, waiters etc.). Although the test takers are familiar with these audiences from carrying out role plays in English lessons at school, in the E8 Speaking Test they will not take on the roles of adults. Functional discourse will therefore exclusively feature tasks that ask two teenagers to carry out social functions.

Informal conversations

In informal conversations personal information is exchanged between people who are familiar with each other and who are from the personal or educational domain about topics arising in their daily lives.

Informal discussions

In an informal discussion the test takers will present arguments and information about a familiar topic from different points of view and they may also phrase a recommendation as to how to solve a problem or react to a certain situation. Infor-mal discussions in the E8 context can only touch the personal or educational domain of the test takers and will exclusively focus on familiar topics. Informal discussions differ from argumentative discourse in the level of formality and in product orienta-tion (recommendation, problem solving).

Text types and audience

The text types mentioned in the previous sections will require different audiences to be addressed. Although EFL lessons offer multiple opportunities to simulate situa-tions from the personal, educational and public domain, the testing situation must not put test takers at a disadvantage by putting them into roles that are very different


from their range of experience or which might make them feel ashamed or shy and thus prevent them from showing what they know.

Therefore, the E8 Speaking Test does not go beyond the typical scenarios the test takers are used to experiencing in EFL education. Moreover, they will not be asked to take on roles that do not reflect their real age, i.e. they will not be asked to speak as parents, teachers, ticket clerks etc.

Prompt writing and prompt difficulty

Validating the content of a test must also be concerned with the question if the tasks are a representative sample of what the test takers are familiar with from lessons that teach speaking and if the difficulty of the tasks is similar.

In order to take care of this aspect, the prompts are exclusively written by practising qualified English teachers who are also trained as interlocutors and assessors. They are familiar with the test construct, the theoretical model of speaking, and the test specifications. Moreover, they apply their experience as experts who have current and intensive contact with the target group.

Prompt writing is carefully trained and follows guidelines. The teachers collabo rate in pairs and produce first drafts, which are screened by tandem pairs. In this way four qualified teachers have given their feedback on the prompt before they are screened by expert E8 BIST trainers who moderate editing if necessary. The completed prompt sets are pre-piloted with learners of the target group by the authors, who function as interlocutors and assessors in the pre-piloting. During this phase last adjustments to repair questions and prompts can be made. If this is the case, additional screening by the trainers is required.

A second pre-piloting takes place during the second interlocutor/assessor training, when these prompt sets are used with pupils from a school other than that of the prompt author. Again, adjustments can be made before these prompt sets are stored in the BIFIE item bank where they are ready to be piloted a last time under real test conditions in the year before the actual exam.

Prompt writers are instructed to generate prompt sets that are similar in construct and ideally identical in the anticipated difficulty for both test takers and in com-parison to other prompt sets. However, there are some variables that cannot be controlled. Test takers who have never encountered the topic or even thought about it or who do not yet have an opinion about it and have to perform on it in the course of the test may certainly find the task more demanding than test takers who have already had experiences with the required content. However, if the appropriate strategic competences asked for in the task (e.g. describing, turn-taking, questioning etc.) are available to test takers they can succeed in such tasks, even if they do not have a wide range of linguistic resources available for the topic.

3.3 Standardising the setting

Similarly to the task, the setting of a test has an impact on the performance. It there-fore seems important to provide a setting that will interfere as little as possible with the performance and that will thus create as little measurement error as possible.

By setting we understand the local performance conditions defined by the physical setting as well as the role of the interlocutors, their characteristics, and their training.


3.3.1 Interlocutor/Assessor characteristics

The interlocutors/assessors working in the E8 Speaking Test project are all practising teachers with a qualification in English as a main subject and teaching at Austrian schools. They must be trustworthy and accurate and be able to work under pressure.In the Austrian E8 Speaking Test project, teachers are carefully trained to act effi-ciently in three roles: prompt writer, interlocutor, and assessor.

All three roles require continuous and carefully sequenced input and practice. This is why all parts of the face-to-face and on-line training sessions feature each role in a progressive mode, i.e. skills are presented and practised continuously and system-atically.

3.3.2. Interlocutor/Assessor training

The training stretches over a period of approximately six months and consists of four phases:

1. Phase One: Face-to-Face (F2F) Meeting 12. Phase Two: Online Training 13. Phase Three: Online Training 24. Phase Four: F2F Meeting 2

Phase One: F2F Meeting 1

In Phase 1 the trainees are made familiar with communicative competence in the CEFR, already mentioned in Chapter 2, and the E8 Speaking Test Specifications, which will be dealt with in detail in the next chapter (see p. 35ff). To set them up in their role as assessor, the trainees are acquainted with the construct of the test and the CEFR scales for the assessment of spoken interaction and oral production before they study the descriptors of the E8 Speaking Assessment Scale (see p. 42), after which they assess several examples of video recorded benchmarked E8 Speaking Tests performances.

Throughout the training assessors are given feedback on their assessor behaviour in relation to the group and can thus reflect and adjust their assessments towards a more homogeneous behaviour with the help of anchor performances and justifications that can be used for individual standardisation practice. Moreover, in the assessment of the E8 Speaking Test multiple-ratings will be collected through the assessment of a representative sample of performances by the whole assessor population on-line and thus assessor behaviour (harshness or leniency) will be adjusted through multi-faceted Rasch analysis.

To prepare for their future role as an interlocutor the trainees are provided with guidelines for interlocutors and interlocutor behaviour, followed by reflected individual and group analyses of video recordings of perfect and flawed interlocutor behaviour.

In order to practise their dual role as interlocutors and assessors the trainees learn to set up the seating arrangement according to a standardised plan (see p. 23) and carry out test simulations with their peers, who provide feedback on individual interlocu-tor behaviour in group discussions.

Finally, to help them with the tasks they have to carry out in Phase Two, they are presented with the intricacies of prompt writing discussed in the previous chapter, and are provided with guidelines on how to go about writing their own prompts.


Phase Two: Online Training 1

In the second phase of training the trainees work together in pairs to design and produce one speaking prompt set (see p. 43ff). To assist them in this task, a second pair of trainees (tandem pair) moderate and edit the prompt set before it is sent to the trainers for a final stage of moderation.

Once the prompt set has been passed by the trainers, the trainees carry out trial speaking tests in one of their schools with eight pairs of fourth year pupils. A select-ed number of these speaking performances showing various competence levels are assessed, justified and reflected upon in pairs. At this point the interlocutors and assessors experiences with the prompts are discussed and analysed. If trialling has uncovered flaws in the prompts quality, more screening and editing takes place.

Phase Three: Online Training 2

During this stage of training the trainees assess eight to ten speaking performances that are made available to them via a secure online platform. The trainees submit their scores on the benchmarked performances to the trainers, thereby providing data to determine inter-rater and intra-rater reliability.

Phase Four: F2F Meeting 2

In this final phase of training the trainees go through a phase of standardisation with an emphasis on the implementation of the prompt sets in a mock E8 Speaking Test, referred to as prompt familiarisation, and on interlocutor behaviour. Three or four prompt sets from within the whole training group and/or the item bank, selected by the trainers, are pre-piloted with a larger cohort of pupils that the trainees have not met before to simulate an authentic E8 testing situation. Each trainee receives at least one opportunity to act as an interlocutor and conduct a speaking test. During the subsequent tests, the trainees either assume the role of assessors, whereby they assess several speaking performances, or they observe their peers acting as interlocu-tors. They thus receive feedback on their interlocutor behaviour and can adjust it if necessary.

In a future F2F meeting, shortly before the actual E8 Speaking Tests take place, the trained interlocutors and assessors go through another standardisation and prompt familiarisation phase.

3.3.3 Physical setting

In addition to the test procedure that is guided by standardised interlocutor behav-iour, the physical setting of the E8 standards test has to be standardised in order to create an environment that will make the results reliable because all test takers are tested in a very similar set-up.

The tests are carried out at the test takers school, which provides them with a familiar environment. The head teachers of the schools are asked to choose rooms that are well lit, well-aired, friendly, and undisturbed. They are also asked to leave two chairs outside the testing room for the next test takers waiting for their turn.

The interlocutors arrange chairs and two tables so that they have enough space to arrange their testing materials and that the test takers sit facing each other and facing the interlocutor (see Figure 2). In the dialogue part the arrangement should allow for the test takers to look at each other.


The assessor sits outside this arrangement but must be able to see the test takers faces.

Figure 2: Test seating arrangement

The interlocutors arrange instructions, prompt cards, question cards, and repair slips in a way that they can find them easily and quickly and they make sure the repair slips cannot be seen by the test takers before they need to use them. A (stop-) watch (with second hand) is brought by the interlocutors for time measurement. This is done as discretely as possible to avoid creating a feeling of time pressure for the test takers.

3.4 The test takers

There are many challenges interlocutors and assessors of oral performances are faced with. The previous sections have provided insight into how the training aims at minimizing the impact of interlocutor behaviour and physical setting on the perfor-mance. This section will look more closely at test taker variables and the factors that influence their performance as well as how the E8 Standards Test deals with these issues.

Davies et al. suggest that a wide range of variables may significantly influence test performance or produce measurement error and thus affect the validity of the assess-ment. These may include language background, age, sex, educational background, background knowledge, affective reactions to test taking, level of proficiency in the target language and familiarity with the test method. (1999, p. 208)

Figure 2: Test seating arrangement

Cand

idat

es

Interlocutor

Assessor


While physical/physiological variables4 like age and sex can be considered to have little bearing on performances because the test takers are all of similar age and at-tending year 8 classes of Austrian schools, cognitive variables like language back-ground, educational background, and background knowledge may have a stronger diversifying impact on performances.

Affective or situational reactions to test taking such as motivation, physical dispo-sition, as well as factors such as learning strategies and styles, attitude, extrover-sion, introversion, anxiety, personality, or risk taking (Bachman 1990, Davies et al. 1999, Kunnan 1995) can hardly be controlled in a testing situation. Nevertheless, the following insights from research have been taken into consideration in E8 Stan-dards Testing: Berry (1994) researched the effect of introversion and extroversion on paired speaking test performance. The results suggest that introverts perform better in homogenous pairs and in tests with interlocutors than if paired up with extrovert test takers. Luoma (2004) suggests that test takers who know each other very well tend to speak less than those who are not too familiar with each other and that acquaintanceship has a stronger impact on performance than a mismatch in profi-ciency level. For this reason, it seemed appropriate that the test takers and/or their teachers should be allowed to choose the peer partners for the E8 Speaking Test in order to rule out disadvantages caused by individual characteristics discussed above.

We thus expect the effects of personality, culture-specific variables, proficiency levels, and acquaintanceship to be reduced to a possible minimum.

As much as introversion may have an impact on a test takers performance, lack of motivation may also result in scores that do not match the actual ability of a test taker. As the E8 Standards Test is a low-stake test with no direct bearing on the takers school career, lack of interest in a good performance can prevent test takers from showing what they actually can do. In order to avoid the undesired situation where examinees do not approach the testing situation in the expected manner and thus threaten the validity of results (Henning 1987), it must be the aim to take any possible measure to foster motivation and to avoid hostile or negative reactions to the content and format (Fulcher 2003). This can be achieved by making the test takers familiar with the content and the format.

At this point it has to be acknowledged that most test takers will not have experi-enced many formal tests in speaking. Apart from rote-learnt role-plays, rehearsed presentations or book-and film-presentations, which become part of continuous as-sessment, teachers hardly ever test speaking. Moreover, teachers do not often assess their pupils pair work. Therefore, it can be expected that the situation of being tested in speaking will be a new experience for most of the learners.

However, it is hoped that teachers will make use of published testing materials in order to support their pupils familiarity with the test format and the test procedure. Prompts, video recorded pilot tests, and the instructions used by the interlocutors are available at the BIFIE homepage and it is therefore possible for teachers to show and practise the testing situation with the learners: (Available at: https://www.bifie.at/node/1821)

Moreover, test takers who have attended eight years of education in Austria and used the accredited course books will have encountered similar speaking tasks and should have reached a level of linguistic competence of at least A2 according to the CEFR in oral production and spoken interaction in the FL as suggested in the curriculum. In 4 Provisions for test takers with special needs are still to be developed (i.e. instruction cards in large fonts, techni-

cal support for the hearing impaired etc.)


favourable situations they may even have reached CEFR level B1. Additionally, the test takers sociolinguistic competence should cover the linguistic markers of social relations and politeness conventions asked for in the E8 Standards Test.

More culture specific aspects of sociocultural and sociolinguistic competence, which are part of EFL and certainly important go beyond the possibilities of standardised testing, are therefore not a requirement for the E8 Standards Test.

Like linguistic competence, making conscious and strategic use of pragmatic com-petence (discourse competence, functional competence, and design competence) is required by the curriculum and thus the test takers are expected to be competent in engaging in interactive speaking tasks that ask them to carry out various communi-cative functions. Moreover, the test takers will most likely have held planned presen-tations in their EFL lessons and thus show design competence.

Finally, the presence of a trained person who encourages communication in a stan-dardised way is the big advantage of the E8 Speaking Test. While in all other skills the test takers are left alone with the task, speaking provides the opportunity for the interlocutor to promote participation. Moreover, the contribution of the paired set-up to motivation and participation generally has a positive impact on the per-formance.

3.5 Standardising the construct: construct validation

According to Alderson et al. (2004, p. 171) [c]onstruct validation refers to what the test scores actually mean, what they tell us about the examinees.

In the following section, the construct space of the E8 Speaking Test will be presen-ted and the assessment criteria will be explained in order to demonstrate the provi-sions that have been made to support the construct validity of the E8 Standards Test and to explain what information can be gained from its results.

As the discussion about the interpretation of the quality of oral performances had to begin before the selection or development of a test, the purpose of the E8 Standards Test, i.e. the information on what was hoped to be learned about the test takers competences, has guided the development of an assessment scale which describes how the competences are displayed at certain levels.

In order to link the competences defined in the construct to objectives that could be judged in a standardised way, a procedure similar to that described by Hanny (2000) was pursued:

First, the purpose of the assessment (finding out about communicative competences) was matched with objectives (can-do statements) as suggested by the E8 BIST. This was achieved by correlating the E8 BIST with the objectives of the ANC and the can-do statements of the CEFR; next, assessment criteria that would address the ob-jectives were developed. Piloting the criteria of the E8 Speaking Assessment Scale in Assessor Training sessions and making use of them in a Benchmarking Conference, the band descriptors were revised several times. The categorisation of criteria for the assessment of the test takers communicative competences as defined by the construct led to the development of a four-dimen-sional assessment scale.


Figure 3: Developing assessment criteria

This comprises the following dimensions:

Task achievement and communication skills (assessing pragmatic and sociolinguistic competences),

Clarity and naturalness of speech (assessing linguistic and pragmatic competences),

Vocabulary and Grammar (both assessing linguistic competences).

The above model only partially describes the evidence that is collected in the assess-ment of spoken performances. The purpose of the assessment encompasses the evidence that can be judged according to the construct criteria that are based on the speaking model, i.e. the communicative competences. These criteria surface as the can-do statements of the E8 BIST and the CEFR respectively. However, the E8 BIST cannot be judged in a vacuum. Thus they appear in combination with content that has to be judged in combination with the E8 BIST, a list of topics from the ANC, the text types appropriate for eliciting the content and the competences, the communicative functions of the tasks, and the audiences to be addressed. All these components frame the construct space, which gives information about what has to be considered in the assessment.

Validity evidence on the basis of the construct space was reflected before and ex-amined after the preliminary assessment scale was established. The construct space and the scale were revised according to information collected during the process of piloting and benchmarking. Tables 14 (see pp. 36ff) display the Construct Space that was considered in the development of the E8 Speaking Assessment Scale (Table 6, p. 42), which comprises four dimensions and seven bands for each dimension.

In the development of the Speaking Assessment Scale clear criteria for the assessment in four dimensions and at seven bands, i.e. levels, were established. Descriptions that have been derived from CEFR scales are available for bands 1, 3, 5 and 7 for each of the four dimensions. Bands 2, 4 or 6 are awarded if a performance is better than one of the described bands but not good enough to be awarded the next higher one. These band descriptors are used to guide the process of assessment. Each dimension receives a score within the seven bands. Although assessment criteria do not completely eliminate varia-tions between assessors, a well-designed scale can reduce the occurrence of discrepancies in combination with careful training of the assessors (Moscal 2000).

A sound understanding of the test construct and its assessment scale is likely to im-prove both inter-rater and intra-rater reliability. Therefore, assessors must be made


familiar with the construct and the scales making use of benchmarked performances and written justifications which exemplify consistent assessment based on set criteria.

The discussion and comparison of written justifications by the assessors and the benchmarks are important in two ways: firstly they help the assessors adjust to the standardised assessment scales and the common understanding of their bands and secondly they unfold possible implicit criteria that may have been applied by the assessors but that are not stated in the scales (e.g. In my class this would be a top per-formance, so this must be a high band ...). Identifying the implicit criteria they may have been using can help the assessors refine their understanding and application of the scales for future assessments.

In this way, the justifications of the benchmarked performances demonstrate how the assessment criteria can be directly related to performance criteria. Moreover, they exemplify the differences between the categories at certain levels through perfor-mances. Thus, the aim of the assessor training is to guide the assessors in a way that they arrive at independent scores based on the band descriptors within a maximum of +/ one band for a given performance.

One method of further clarifying the E8 Speaking Assessment Scale and to raise the level of awareness and recall shortly before the test is through the use of anchor per-formances. Anchor performances are a set of carefully selected benchmarked respon-ses that illustrate the nuances of the categories. These will be presented at the standar-disation meeting prior to the mock test before the actual test and made available on a secure platform so that the assessors may refer to the anchor performances shortly before the assessment process. This should re-enforce the standardisation and give the assessors the chance to remember the anchor performances and the assessments when this information is really needed. This opportunity for individualised recall is important because the organisation of the E8 Speaking Test requires a time-slot of several weeks within which the assessors may have to operate.

3.5.1 The Assessment Scale

In order to make the criteria of the assessment scale result in valid interpretations of a response it is necessary for the criteria to be related to the purpose of the assessment. Therefore, the criteria should be defined in a way that any given response would receive the same assessment regardless of who the assessor is or when the response is assessed.

Therefore, the descriptors of the analytic assessment scale that assessors work with in the context of the E8 Speaking Test have been carefully designed and linked with the con-struct to report about the test takers abilities in four dimensions (see p. 41f ).

The E8 Speaking Assessment Scale is applied by the assessors in situ, i.e. the assess-ment has to be achieved during the test takers performances. This constraint was considered in the initial development of the E8 Speaking Assessment Scale and taken seriously in the adaptations of the scale during piloting and benchmarking which resulted in a shorter and more user-friendly version.

To provide feedback on the test takers communicative competence, the most signi-ficant competences needed for speaking as defined in the test construct (see p.36) are assessed in the following dimensions: task achievement & communicative skills, clarity & naturalness of speech, grammar, and vocabulary. Due to the above men-tioned constraint of in situ assessment the three parts of the E8 Speaking Test, the monologue, short dialogue, and the long dialogue are assessed holistically, i.e. the


test takers are awarded one score on each of the four dimensions. The following in-terpretative descriptions of the four dimensions of the E8 Speaking Assessment Scale add to the reliability of results in the sense that the judgements are based on defined categories and band descriptions.

Task achievement & communication skills

In task achievement and communication skills the information the test takers pro-vide (propositional precision, in all parts), the quality of the narrative (thematic development, primarily in the monologue part) as well as the ability to interact with a partner (turntaking, primarily in the dialogue part) are assessed.

Propositional precision refers to the information that is communicated in the per-formance as well as to the successful completion of a communicative speech act. In propositional precision we ask ourselves: What is the information we get like? Is it detailed, concrete, limited, or more or less non-existent?

In the monologue part the test takers are asked to give information about a given topic. In addition they are provided with content points. Thematic development primarily refers to the monologue part. It deals with the way the speaker develops a speech act with respect to the given theme. It is to do with the elaboration of ideas and the narration. If individual ideas (main points) are expanded with relevant de-tail, thematic development has been very successful.

At the other end of the scale, in basic statements at word or word group level, themes cannot be developed.

From the linear design of the prompts we can expect the test takers to address the content points in the sequence that they appear on the prompt cards. However, the order is not set and therefore test takers may incorporate them into their spoken production in a random order. The content points are to be seen as guiding points for the test takers, to help them to speak freely for two minutes about their chosen subject, but they are not mandatory and test takers are not penalised for not address-ing them. The assessor must concentrate on the overall amount of information that the test taker is able to pass on and its quality and evaluate it according to the assess-ment scale. We expect test takers to talk about the topic they have chosen and to give information that is relevant to the topic. Test takers may even choose one content point only, but if they give varied information on it they can still reach high bands.

The repair questions provide a guideline of what we would expect the test takers to talk about in a sufficiently solved task. As the test takers are supposed to produce a flow of discourse in the monologue section, and not interact with the interlocutor, it will not be possible to assess the true level of the candidates communication skills here. If, however, they do interact by asking for the translation of a German word in English (e.g. What is Schlger in English?) they should receive the support necessary to carry on. What we can expect in this section are the use of discourse markers such as well; like; actually; generally; of course; you know; that will reflect the level of test takers competence in communication skills.

In turntaking we assess the test takers ability to interact with each other. This can be seen as the ability to begin, maintain, and end a conversation. The test takers may use prefabricated chunks, stock phrases, discourse markers, or formulaic language in doing so. If the test situation does not allow for beginning or ending the conversa-tion the lack of evidence for this does not necessarily lead to downgrading. If effec-tive turntaking has been found in the conversation, high bands can still be awarded.


In the short dialogue we can expect the test takers to exhibit turntaking skills in or-der to achieve the task which may be an invitation, an excuse, a purchase, a decision making process (e.g. which film to watch) etc. We can thus expect the test takers to show, in a guided way, the extent to which they are able to initiate, maintain and close a conversation and how effective they are when doing this. Good speakers will have no problems formulating the necessary questions to accomplish the task. Utterances containing suggestions (e.g. Would you?), agreement (e.g. Me too.), or disagreement (e.g. No, I dont.) and their quality will also indicate communicative competence. Other indicators of commu-nicative competence will be the use of stock phrases such as of course and not at all and the frequency of their use.

In the short dialogue the test takers are asked to accomplish a functional discourse. The detail of information may be limited by the task, therefore the successful comple-tion of the communicative function is the element we are assessing. The functional aspect of the short dialogue requires the test takers to come to a defined result. Bearing this in mind, it is likely that the test takers will refer to all the points in the prompt, because they should succeed in fulfilling the function that is required.

The long dialogues are guided by question prompts or key words that serve the same function as the content points in the monologue. They are stimuli but not compul-sory items to be dealt with. If the test takers develop a conversation about the topic following their own ideas the task can still be rich in the quality of information we get.

Unlike the linear designs of the monologue, the long dialogue prompt is cyclical and there is no telling which content point, (if any), a candidate will address first. As in the monologue section it is not mandatory to address all the content points. At E8 we can expect good speakers to discuss many of the points, perhaps even all of them, and to discuss them in some detail. However, the fact that the test takers should interact with each other, and may in some cases even interrupt each other, it is less likely that they will have the opportunity to provide too much detailed information before they are confronted with another point by their partner.

As soon as one candidate has started the conversation and the other candidate has replied, the decision to initiate, maintain, and end parts of the conversation lies in the hands of the individual candidates, unless there is a marked imbalance or break-down of communication and the interlocutor intervenes. Speakers with good com-munication skills will try to provide a good balance between their verbal input using learnt phrases such as I think; In my opinion etc. We can expect good speakers to use phrases such as Me too; I agree/disagree; Really?; Cool etc. when reacting to their partners utterances. And finally stock phrases such as And what about you? What do you think? Whats your opinion? will be employed by good speakers to encourage verbal output from their conversation partners.

Generally speaking the prompts, content points, key words, and question prompts are there to make the test takers talk. If they find their own ways of solving the task and the information we get is appropriate and rich this is equally valuable.

In the assessment of task achievement & communication skills the test takers are allocated one of seven bands.

Band 7 performers give detailed information and are able to expand main points by relevant new elements. They are effective in turntaking.


Band 5 performers give concrete information that is clear and they develop a straight-forward narrative in the monologue part. They achieve basic turntaking and can in-itiate, maintain and close a conversation using stock phrases.Band 3 performers give limited information and in the monologue they give a simple list of points at sentence or word-group level. They can ask questions effectively in the dialogue parts. The test takers may partly rely on the interlocutors support through repair questions to keep going or to come up with some more information.

Band 1 performers give very little information and cannot go beyond simple state-ments or negations on word or word-group level in the monologue part. This will mostly result from the fact that they cannot develop a narrative independently and rely on the interlocutors repair questions to come up with some information. They make attempts to ask questions (e.g. raising intonation) but are not effective in questioning. The interlocutor may have to use the repair question cards to keep the dialogue going.

Clarity & naturalness of speech

A performance is considered natural and clear if the pronunciation is intelligible and the intonation makes it sound natural. In order to achieve this, performances have to reach a certain level of fluency and phonological flow. The natural flow of language in fluent speech is accompanied by the seemingly effortless selection of elements by an individual speaker and the ability of the other participant(s) to converse appropri-ately on topics. In doing so, the participants of fluent conversations retrieve chunks and provide interactive support to the flow of talk, helping each other to be fluent and creating confluence in the conversation. Thus they are able to express ideas appropri-ately, coherently and speak at an appropriate pace and use pausing at expected points.

In the E8 Standards tests, clarity and naturalness of speech surfaces as phonological flow in the sense that natural and clear pronunciation and intonation should make it possible for native speakers of English to understand the test takers messages with-out making compromises or too many guesses on meaning.

In the monologues the speakers are expected to speak fluently and naturally for two minutes and their narrative should flow in the sense that it is as coherent and cohesive as unplanned speech can be. That is, we cannot expect elaborated, complex sentences or backward and forward referencing of the quality of a written text, but we expect the test takers to use simple connectors (and, but, because, first, then, later, at last, personal pronouns etc.) and possibly some stock phrases that highlight the beginning, the main part, or the end of their presentation (I have chosen the topic ..., the most important thing..., what I like best is..., all in all this was ..., finally I would like to say that ...). In dialogues discourse markers (well, you know, right ...), formulaic speech (have a nice day, see you, and you ...) as well as pre-fabri-cated chunks and phrases (would you like a ...?, the thing is ..., are you with me? ...) make spoken language fluent and compensate for grammatical or lexical planning.

In the assessment of clarity & naturalness of speech the test takers are allocated one of seven bands.

Band 7 performances that sound clear and natural are fluent and spontaneous. The performances are delivered at a fairly even tempo and pauses are naturally placed. The speakers will produce longer stretches of language (especially in the monologue part) with pronunciation and intonation that make the performance sound natural and clear.


The performances of band 5 speakers show some degree of fluency, although some pausing for lexical or grammatical planning can be necessary. The speakers produce connected stretches of language that are long enough for pronunciation and into-nation to sound intelligible, although sometimes with a foreign accent. At this level some mispronunciations that do not impair communication can be tolerated.

A band 3 performance is interrupted by noticeable pauses, hesitations and false starts, which sometimes cause breakdown of communication. The contributions are short and intelligibly pronounced, too short, however, to develop natural intonation. Foreign accent or mispronunciation may sometimes impair communication.

In a band 1 performance the speaker is very hesitant which frequently causes break-down of communication. This may not necessarily be caused by pronunciation problems, but the very short and isolated utterances or frequent mispronunciations may either not allow for an evaluation of pronunciation or make it hard for native speakers to understand the message.

Grammar

The scale for grammar comprises descriptors for range, control, and the clarity of the message. Therefore, the assessors evaluate the ability to make use of a range of grammatical structures, the level of their accuracy as well as their impact on the message. The focus is on grammatical forms that create meaning and that are reason-ably correct to accomplish successful communication. In addition, the assessment of grammar in the E8 Speaking Test considers the nature of grammar in unplanned speech (see p. 11).

Although there is some planning time, speech production in the E8 Standards Test takes place in real time and is therefore considered to show the characteristics typical of unplanned speech. Thus, the performances are expected to be linear and the test takers will mostly use an add-on strategy of stringing short idea units together. While we generally expect complete sentences in the monologue, the dialogues will primar-ily feature incomplete sentences, word groups, short phrases, or chunks. We have to acknowledge that incomplete utterances (Could be), ellipsis (Sounds like a good idea), syntactic blends (utterances that blend two grammatical structures as in Ive been to London last year), or vague language (kind of machine) are natural. Moreover, present, simple, or active verb forms, will, would, can, personal pronouns, and determiners are frequent; past forms, perfect forms, and the passive are rare.

In the context of E8 Speaking, grammatical range must be seen in relation to the above-described nature of grammar in unplanned speech and the standardised tasks of the E8 Speaking Test. On the one hand, we will expect the test takers to use struc-tures that are meaningfully elicited by the task. On the other hand, spoken language produced in real time has its special features. The speaking prompts focus exclusively on familiar topics and have been designed in a way that all ability levels have a good chance to succeed in the speech act. Thus, they are as straightforward in their set-up as possible. However, this does not suggest that the response cannot exceed the com-plexity of the stimulus. Even if a task is simple in

Date post:	18-Nov-2015
Category:	Documents
Upload:	fitriana-susanti
View:	19 times
Download:	0 times

TR Speaking 130805

Documents