Workshops



 Workshop 1: How to develop and design valid, innovative and complex computer-based items?
 Workshop 2: Applying Test Score Equating Methods using R
 Workshop 3: Item banking for optimal tests
 Workshop 4: Comparative Judgement for Research and Practice: an Application of D-PAC
 Workshop 5: Large-scale performance assessments: problems and potentials

The Workshop fee (€100, for 1 workshop only) includes 1 coffee break and lunch.

All pre-conference workshops will take place on November 8, 09:30 - 16:30. Please select only one workshop during registration.


Workshop 1


How to develop and design valid, innovative and complex computer-based items?

Discussion, sharing experiences and working with innovative item types in a digital environment

Download as PDF

Presenters

Pia Almarlind, Assessment developer, Umea University, Sweden.
Patric Åström, Assessment developer, Umea University, Sweden.
Mattias Abrahamsson, Assessment developer, Umea University, Sweden.

Biographies
Pia Almarlind is from Umeå University, Sweden. She has worked as an assessment developer since 2008 and had the privilege being in the start-up process when the national assessments in science for 15 year old students were developed. She is now responsible for the development process of the national assessment in chemistry but is also involved in the development process of the assessments in biology and physics. She is also a project leader for two university-based projects, one that includes developing a general working model for assessing and judging essays by using teacher moderation and one that includes developing a digital training course for university teachers in principles for test developing, assessment and grading in higher education. Pia continuously arranges and leads workshops for teachers in item and test development, assessments and grading. In 2010 she consulted as an expert by the National Agency of Education in the development work for the Swedish curriculum Lgr11, with a focus on the curriculum in science. Before working at Umea University she was a science and mathematics teacher for 10 years at lower secondary school.

Patric Åström is from Umeå University, Sweden. He has worked as an assessment developer since 2016. He is now responsible for the development process of the national assessment in biology but is also involved in the development process of the assessments in chemistry and physics. He is also involved in an university-based project that include developing a digital education for university teachers in principles for test developing, assessment and grading in higher education. Before his time at Umeå University he has been teaching science and mathematics for 20 years at lower secondary school.

Mattias Abrahamsson is from Umeå University, Sweden. He has worked as an assessment developer since 2009. He is now responsible for the development process of the national assessment in physics but is also involved in the development process of the assessments in biology and chemistry. In year 2012 Mattias was working as a teacher educator at the University. Before working at Umea University he was a science and mathematics teacher for 10 years at lower secondary school.

Why AEA members should attend this workshop
The aim of the workshop is to gather people from different countries, contexts and with different perspectives and use the day to develop new ideas concerning developing and designing innovative and complex computer-based items in large-scale assessments, for example chat-items and simulations. We want to offer a day that balances short presentations, creative practical work and constructive discussions. By working collaboratively on concrete item examples, including both paper-based and computer-based items, and there after discuss opportunities, constraints, challenges and threats, the participants will by the end of the day hopefully have a number of new innovative item ideas for future development.

Who this Workshop is for
The workshop is designed to engage educational professionals e.g. assessment developers, researchers, educators from different countries who are working with and/or have an interest in designing innovative and complex computer-based large-scale assessments and items.

Overview

In Sweden the national tests are designed as a paper-based assessment, but some test parts give students the opportunity to present their proficiency orally and practically. In Sweden different universities are responsible for developing the national tests, commissioned by the National Agency of Education. In 2016 some universities were assigned to start the development of item examples for digital national tests. The idea is to gradually introduce digital national tests between 2018 and 2022. The tests are intended to measure student proficiency in relation to the Swedish curriculum and work as a support for consistent national assessment and grading.

A project group at Umeå University, which is responsible for the national science tests, has started work on a test model for a digital national test in science. The test model is supposed to be aligned with the curriculum and fulfil its national aim. The project group also wishes to fully challenge item types in the digital sphere, where e.g. animations, film clips such as courses of events and simulations are available.

The project group has in their process been inspired by the released digital item examples from PISA (http://www.oecd.org/pisa/test/other-languages/) and SimScientists (http://simscientists.org/home/index.php). Inspiration also comes from ATC21S (Griffin, Patrick, McGaw, Barry, Care, Esther (Eds.) 2012 and (http://www.atc21s.org/). The project group now wishes to be part of a larger network for further development work.

Questions like what do different countries' test systems look like, how do they build and ensure the quality of different item examples in a digital sphere, what do innovative digital item examples look like and how are they developed, need to be answered.

In the first session of the workshop the presenters will give an overview of the Swedish test system and will present what some concrete examples of innovative paper-based items in science look like in the tests.

In the second session the participants will have an opportunity to present some concrete examples of innovative items and share their knowledge, experiences and issues concerning the development and design of different types of innovative items in a paper-based and/or a computer-based test system. In this session the participants also will get the opportunity to answer the items. This opportunity will bring space for wider reflections around the developing potentiality.

The third session will focus on collaboration and development of ideas. During this session the participants will work in small groups with practical tasks. The purpose is to develop concrete suggestions of ideas for some selected items, and see how the digital format can be used to given the purpose of the items e.g. start formulating items based on ideas presented in earlier sessions and develop items from new ideas. 

In the fourth session each group will will be given an opportunity to provide feedback concerning the items discussed in session 3 in terms of opportunities, constraints, challenges and threats.

Finally we will summarize the sessions by discussing what to bear in mind when developing different types of innovative computer-based items, how to move forward and how to create a future international network.

Preparation for the workshop
To make the day as constructive and dynamic as possible we welcome each of the participants to put together a sample of innovative item examples and send them to the presenters one week before the conference. We also welcome each of the participants to prepare a short presentation, 10-15 min, of their own experiences working with developing the items. We also recommend the participants to bring the tools and equipment that requires to present and to give the participants the chance to answer the items.

Schedule


Time Session Presenter
0900 Coffee and registration  
0930 Welcome & introductions
Outline of the Workshop
Pia Almarlind, Patric Åström, Mattias Abrahamsson
0945 Overview of the Swedish test system and presentation of innovative items in science, focusing on aim, format, structure and content. Presenters and participants
1100 Break  
1130 Presentations of innovative items, focusing on aim, format, structure and content. Individual evaluation of some specific items. Presenters and participants
1300 Lunch  
1400 Practical work. Collaboration and developing ideas.

Group presentations. Concrete suggestions of ideas and constructive feedback.
Presenters and participants
1530 Break  
1545 Summary and discussion about how to create a future international network. Presenters and participants
1630 Workshop close  



Workshop 2


Applying Test Score Equating Methods using R

Download as PDF

Presenters

Marie Wiberg and Jorge González

Biographies
Marie Wiberg is a professor in Statistics with specialty in psychometrics at Umeå University, Sweden. She has a long history of working with different types of achievement tests, including a college admissions test and large-scale assessments such as TIMSS and PISA. She has used the data from these tests in practical applications and also to develop a number of useful methods which can be applied to different achievement tests. Wiberg has published a number of innovative papers about test equating and is the one who initiated the development of the R package kequate which is specialized to perform kernel equating.

Jorge González is associate professor at the Department of Statistics, Pontificia Universidad Católica de Chile. He is a permanent consultant at MIDE UC, a measurement center in Chile which administers a large private national achievement test (SEPA). He has also been a consultant for DEMRE, the unit in charge of the Chilean university entrance test, and for the National Agency for Quality Education which administers the national assessment test SIMCE. He has published a number of papers on equating during the past years and is the developer of the R package SNSequate.

González and Wiberg are the authors of the book Applying test equating methods using R which was released in March 2017.

Overview
Equating aims to adjust test scores on different test forms so that the test scores can be used interchangeably (González & Wiberg, 2017). Equating has a central role in large testing programs and it constitutes an important step in the process of collecting, analyzing, and reporting test scores. Equating is important as it ensures a fair assessment regardless which time, place or background different test takers might have. This pre-conference workshop has two main goals. The first goal is to provide an introduction to equating. Through a number of examples and practical exercises, attendees will get both a conceptual and practical understanding of various equating methods conducted under different data collection designs. The R software will be used throughout the session with special focus on the packages; equate, kequate, and SNSequate. The second goal is to provide the necessary tools to be able to perform different equating methods in practice by using available R packages for equating. The training session follows the chapters in the book Applying test equating methods using R which has been written by the instructors and will be released in March 2017 by Springer.

Content
The training session will start by introducing traditional equating methods and different data collection designs and illustrates how they can be performed in the R packages equate and SNSequate. Next, the attendees will be guided through the five steps of kernel equating: i) presmoothing, ii) estimating score probabilities, iii) continuization, iv) equating and v) calculating the standard error of equating, using the R packages kequate and SNSequate. The attendees will then be introduced to item response theory equating and will receive practical guidance on how to perform these methods using R. The workshop will end with practical recommendations and examples when performing equating in order to conduct a fair assessment regardless of the time, place or the background of the test takers. This pre-conference workshop will provide the attendees with a broad knowledge of recent developments in equating as well as how they should be performed within the R environment. They will get a number of opportunities to familiarize themselves with the currently available equating R packages. Throughout the training session, examples and exercises will be provided and hands on examples will be encouraged to be conducted by the attendees. Attendees are expected to bring their own laptop with R installed together with the latest versions of the R packages equate, kequate, and SNSequate. Electronic training materials will be provided to the attendees.

Intended audience
Researchers, graduate students and practitioners and others with interest on how to conduct equating in practice. An introductory statistical background as well as experience in R is recommended but not required.

Preparation for the workshop
Attendees are expected to bring their own laptop with R installed together with the latest versions of the R packages equate, kequate, and SNSequate. Electronic training materials will be provided to the attendees.

Schedule


Time Session Presenter
0900 Coffee and registration  
0930 Welcome & introductions
Outline of the Workshop
Marie & Jorge
0945 The equating principles, designs, classical equating methods Jorge
1100 Break  
1130 Kernel equating methods Marie
1300 Lunch  
1400 Item response theory equating Jorge
1530 Break  
1545 Practical recommendations in order to provide a fair assessment Marie
1630 Workshop close  



Workshop 3


Item banking for optimal tests

Download as PDF

Presenters

Angela Verschoor and Caroline Jongkamp

Why AEA members should attend this workshop
The workshop will offer an introduction into Item Banking and applications for test assembly from a practical point of view. Participants will gain insight in the do's and don'ts when using an item bank for the purpose of developing assessment instruments, and will receive practical guidelines to use metadata and psychometric theory to assemble optimal tests based on an existing item bank. participants will have hands-on experience in using automated tools to make linear or adaptive tests, based on Item Response Theory (IRT) or Classical Test Theory (CTT). Main features of these applications will be addressed in the workshop. Participants will be able to understand and assess the usefulness of item banking in their own work.

Who this Workshop is for
The workshop is aimed at those who want to know more about item banking and test assembly with a focus on applications. Participants might be novice or more experienced users. No prior knowledge is required to attend the workshop, although some knowledge on CTT and IRT would be welcome.

Participants will practice using software for some examples and are invited to bring their own laptops for practicing (Windows).

Overview

The workshop starts with an introduction to item banking as part of the test development cycle, from the perspective of the test developer.

The first session of the workshop starts with some theory and best-practices: why is item banking an important issue; how can we make item banking be profitable for us, the test developers? An overview on item banking systems will be given, and participants will be encouraged to share their views and experiences with item banks.

In the second session participants will learn about the main features of the test construction process. The participants will practice specification of test requirements, using examples from the test construction experience of the presenters and, when available, from the participants themselves.

A brief introduction into Classical Test Theory and Item Response Theory will be given, with a focus on the use of both in item banks and the optimization of the test design The participants will practice with hands-on exercises to assemble their own test based on an example item bank. Special attention will be paid to the development of multiple parallel test forms. The use of these multiple test forms will be discussed, as well as the requirements that must be fulfilled: test equating usually needs the use of anchor items, while security measures usually limit item use.

The topic of the third session will be computerized adaptive testing. The goals and usefulness of simulations for constructing CATs will be discussed. The measurement characteristics of a CAT can be studied and set before publishing it. This way, the performance of proposed selection algorithms and constraints can be studied and possibly altered to better suit the needs of the stakeholders.

In the fourth and last workshop session various aspects of item bank maintenance and renewal will be discussed: how can we identify potential shortcomings in the available item pool, what role do security issues and item renewal schemes play in a project? Developing long-term views in item banking will be the main topic in this session.

Preparation for the workshop
No special preparation is required, the workshop format will be interactive allowing participants to discuss their own experience and/or problems. If available, participants are encouraged to bring their own item bank data for discussion. It is the belief of the workshop leaders that sharing experience in applications will stimulate and enable participants in solving educational measurement problems that they encounter in their practice or anticipate encountering.

Schedule


Time Session Presenter
0900 Coffee and registration  
0930 Welcome & introductions
Outline of the Workshop
 
0945 Introduction to item banking as part of the test development process

Hands-on exercise 1
Presenters and participants
1100 Break  
1130 Main features of linear and (computerized) adaptive tests, test specifications and item bank requirements

Hands-on exercise 2
Presenters and participants
1300 Lunch  
1400 Using IRT and CTT in test assembly

Hands-on exercise 3
Presenters and participants
1530 Break  
1545 Item bank maintenance and renewal

Hands-on exercise 4
Presenters and participants
1630 Workshop close and evaluation  



Workshop 4


Comparative Judgement for Research and Practice: an Application of D-PAC

Download as PDF

Presenters

San Verhavert, Sven Maeyer, Renske Bouwer and Tine van Daal

This workshop is given by researchers from the D-PAC project (www.d-pac.be). D-PAC stands for the Digital Platform for the Assessment of Competences and uses Comparative Judgement (CJ). This four year project started in 2014 and is a partnership between the University of Antwerp, Ghent University, and imec, a R&D and innovation hub in Belgium. Besides developing the tool and optimizing its usability, the team is doing research on the validity, reliability, and efficiency of CJ, as well as on how it can facilitate feedback for both individuals and organizations.

Why AEA members should attend this workshop
This workshop provides valuable insights for researchers and practitioners working with or planning to work with the method of comparative judgement.

Who this Workshop is for
Researchers and practitioners interested in CJ research and assessment. Knowledge of R (or Jamovi) is not required, but some basic notions might be useful.

Overview

Comparative Judgement (CJ) is an assessment method introduced by Pollitt (2012) and is based on Thurstone's Law of Comparative Judgement (Thurstone, 1927). Assessors receive pairs of students' work and judge which one is better concerning a competence under assessment. Based on these comparisons, performances can be ranked on a scale from low to high quality. Previous research has shown that comparative judgements results in reliable rank-orders for a wide variety of competencies (McMahon & Jones, 2015) and for inter-board comparability studies (Bramley, 2007). In the last decade, CJ has been increasingly implemented in education and research. There is, however, considerable variation between CJ assessments, such as in the pair selection algorithm, and assessor expertise. This can have an impact on the minimum number of comparisons and assessors needed for a reliable and valid rank-order. To make the best choices for a successful implementation of CJ, both educators and researchers need a basic understanding of its theoretical principles and the techniques regarding the set-up and analyses. This workshop aims to fulfil this need. It is intended for researchers and practitioners who use, or intend to use, CJ in their research or assessment. By the end of this workshop the participants will be familiar with the basic principles and techniques behind CJ.

The workshop will exist of two parts: a theoretical and a hands-on part. In the two morning sessions participants will get to know CJ in an interactive way. We will present different algorithms for the selection of pairs and their impact on the (reliability of the) rank order. Further, we will present the broad applicability of CJ. In the D-PAC project (www.d-pac.be), for example, we have applied and tested CJ for the assessment of a wide range of competencies in all levels of education, in peer and teacher assessments, as well as in HR contexts such as for job selection. We have also applied CJ in the context of numerical cognition, audiology, and professional development. In these assessments we have experimented with different algorithms, different types of feedback, different numbers of assessors and comparisons, and differences in assessor expertise. Based upon these experiences, we will discuss the requirements for a reliable and valid CJ assessment. In small groups, participants will discuss the possibilities of applying CJ in their own research or assessment practice and formulate specific research questions.

The two afternoon sessions will be hands-on, in which participants will set up their own CJ study in the D-PAC tool. We will show the functionalities of D-PAC, using a worked example, after which the participants will conduct their own assessment. Afterwards, participants will analyse the data that is generated by the tool. We will focus on the meaning and interpretation of the Scale Separation Reliability and the misfit statistics. We will guide the participants through all the necessary steps of CJ data analysis in Jamovi (www.jamovi.org), a graphical user interface built on top of R. Knowledge of R is not required, but some basic notions might be useful.

Preparation for the workshop
Participants should bring their laptops with the latest version of R and Jamovi pre-installed. Internet connection will be required to distribute the necessary files.

Schedule


Time Session Presenter
0900 Coffee and registration  
0930 Welcome & introductions
Outline of the Workshop
All presenters
0945 Comparative Judgment and underlying principles  
1100 Break  
1130 Requirements for a reliable and valid CJ assessment  
1300 Lunch  
1400 Hands-on: set-up of a CJ assessment and judging  
1530 Break  
1545 Hands-on: analyses and results  
1630 Workshop close  



Workshop 5


Large-scale performance assessments: problems and potentials

Download as PDF

Presenters

Rianne Janssen, Eef Ameel, Jetje De Groof and Alexia Deneire

Why AEA members should attend this workshop
Large-scale assessments are commonly limited to paper-and pencil tests or their digital alternatives. However, for some competences performance assessments are deemed necessary, despite the fact that such assessments bring their own problems. In this workshop we discuss the use of large-scale performance assessments from different perspectives encompassing theoretical considerations, quality assurance, examples of good practices, and issues brought up by the participants. AEA members should attend this workshop if they want to broaden their scope on large-scale assessments or when they need a practical framework to start implementing performance assessments themselves.

Who this Workshop is for
Policy-makers, test developers, psychometricians and experts in educational measurement.

Overview

Measurement issues in performance assessments

The measurement and scoring of the performance of students on such tasks are not without problems. Specific psychometric issues to be dealt with are local dependence, multidimensionality and standard setting. Standard models from Item Response Theory (IRT) may not be fully appropriate and alternative measurement models, like models for the joint classification of items and persons, may seem necessary. It will be illustrated how large-scale assessments of ICT literacy in the K-12 range dealt with these issues.
Our goal is that participants will understand the psychometric challenges of measurement and scoring in performance assessments on the one hand and see ways how to tackle these issues on the other hand.

Quality criteria for large-scale assessments of competences

In this part, we discuss an evaluation matrix, resulting from a research project issued by the Flemish Ministry of Education. The matrix aims to give a framework to evaluate the quality of large-scale performance assessments that focus on quality monitoring at the system level. Firstly, we offer an overview of essential challenges large-scale performance assessment are confronted with. Secondly, we offer insight in the building blocks and quality criteria of high-quality performance assessments. To get an idea how those quality criteria take shape in real-life situations, we analyze a worldwide range of examples of performance assessments.
Our goal is that participants will get a sense of the continuous consideration one has to make between reliability, validity and feasibility when designing large-scale performance assessments. Choosing for one of the components also means that sacrifices have to be made regarding the other components.

Design and development of large-scale performance assessments

The design and development of large-scale performance assessments in practice is the focus of the third session. Based on the experiences within the Flemish national assessments program, guidelines for test development have been set up along with a detailed project time line in order to resolve performance assessments' challenges with respect to test construction, test administration, scoring, data analysis and standard setting. Guiding principles include the translation of attainment targets into specific performance goals on the performance task, organizational feasibility, extensive pilot testing and the development of clear evaluation criteria. Throughout the presentation detailed illustrations of examples of performance assessments are shared.
Our goal is that participants will learn new ways to tackle the many problems imposed by performance assessments so that they may become inspired to consider including performance assessments in their own future assessments.

Discussion: Introducing large-scale performance assessments in your country? Why (not)?

The last time slot of the workshop is devoted to a plenary discussion on the costs and benefits of performance assessments. When is it advisable to introduce performance assessments in large-scale assessment programs and when not? What are good reasons (not) do so?
The goal of this session is to have an open discussion on the issues brought up by the participants to help them develop their own view as policy makers and test developers with respect to the use of performance assessments.

 

Preparation for the workshop
In order to prepare for the discussion, participants can bring their own examples of performance assessments, or descriptions of situations where the use of performance assessments is being discussed. Participants are also encouraged to read the policy brief (Tucker, 2015) of the recent report of the ETS expert commission on 'Psychometric considerations for the next generation of performance assessment' (Davey et al., 2015).

Schedule


Time Session Presenter
0900 Coffee and registration  
0930 Welcome & introductions
Outline of the Workshop
 
0945 Measurement problems and potentials of performance assessments  
1100 Break  
1130 Quality criteria for large-scale assessments of competences  
1300 Lunch  
1400 Design and development of large-scale performance assessments with examples of the Flemish national assessments program  
1530 Break  
1545 Discussion: Introducing large-scale performance assessments in your country? Why (not)?  
1630 Workshop close