NEPS TBT: Work Package Technology Based Testing

As a consortium partner, the DIPF contributes to the planning and implementation of the National Educational Panel Study (NEPS). One of focal point of this project is the work package on Technology-Based Testing (TBT).

Project Description

The Technology based Testing (TBT) work package is part of the NEPS methods group and is located at the DIPF in the TBA centre (Centre for Technology-Based Assessment). There it is under the scientific leadership of Prof. Dr. Frank Goldhammer and scientific co-leadership of Dr. Daniel Schiffner as well as the operational leadership of Dr. Lena Engelhardt. NEPS-TBT works closely with the Leibniz Institute for Educational Trajectories (LIfBi) and is concerned with innovative survey and test methods, for example, computer- and Internet-based skills testing.

Project Objectives

The TBT work package supports the implementation of Technology based testing in NEPS, especially in the domains of reading and mathematics, with science-based services, project-specific adaptations of software products, and accompanying scientific research.

Project Phase 2023-2027

In addition to providing scientific services, NEPS-TBT also aims to conduct accompanying scientific research on currently relevant topics in NEPS. In the current project phase, this includes

Co-design and implementation of proctored vs. unproctored online surveys
The focus is on the experimental investigation of possible future online survey formats and the effects of these formats on, for example, processing behavior or data quality. With the survey formats to be tested, new promising possibilities are to be explored in comparison to the classic one-to-one interview situation in the household. For example, people could complete the competency tests online accompanied by a virtually connected interviewer (proctored mode), or complete the competency tests independently in an online setting (unproctored mode). Indicators for potentially deviating processing behavior (e.g., prolonged inactivity, rapid guessing, etc.) are developed, read out at runtime, and appropriate prompts are designed and presented as interventions. It will be tested whether such prompts can induce behavioral adaptations. Furthermore, it will be investigated whether the different conditions lead to a valid interpretation of the outcomes comparable to the classical one-to-one setting.
Diagnostic use of process indicators, e.g., to predict panel readiness
On the basis of log data, process indicators are to be extracted that can be used for modeling competency data and, for example, serve the research-based further development of existing scaling models. Process indicators can also be used to consider aspects of data quality or missing coding, i.e., the assignment of missing values to a category of missing values.
In addition, process data will be used together with outcome and paradata, such as response times, to predict willingness to participate in follow-up surveys. This can result in risk profiles with regard to drop-out, from which implications for panel maintenance and incentivization can be derived.

Research Topics

Investigation of different survey formats in online settings (e.g., proctoring, prompts)
Investigation of processing behavior in online tests and effectiveness of behavioral interventions
Predicting willingness to participate in follow-up surveys using multiple data sources, such as process indicators, outcome data, paradata
Creation and validation of innovative item and response formats for computer-based testing
Analysis and validation of process-related behavioral data from competency measurements
Modeling of processing speed

Science-based Services

Provision of the CBA ItemBuilder and a deployment software (IRTlib) for the delivery of computer-based test modules
Study-specific support in the form of support in the creation of test items and support in the creation of test modules
Regular workshops as well as the development of a knowledge database to support item authors in the independent creation of computer-based test modules
Prototypical creation of innovative and new item formats
Coordination of requirements for the further development of the authoring tool CBA ItemBuilder and the deployment software (IRTlib) for use in NEPS
Preparation and analysis of data sets (outcome and process data) and provision of existing tools of the TBA centre or the analysis of the collected data

Completed Project Phase 2018-2022

The superordinate aim of NEPS-TBT was the operation of scientifically grounded technology-based assessments that can connect to interna¬tional standards. Five central innovation aspects contributed to this objective: (1) update of software components step by step, (2) transfer of assessment innovations (e.g. innovative item formats & increase of measurement efficiency) in panel studies, (3) cross-mode linking also for heterogeneous assessment hardware (tablets, touch entry), (4) data processing of all TBT data via log data, (5) automated software testing and quality assurance. These foci of innovation were specifically implemented in the following work packages:

A strategy had been developed for the testing and quality assurance of study-specific TBT-modules. By means of automated testing, complete data storage checks were enabled. Automated testing was furthermore serving the quality assurance of fixed test assembly and allowed for checking adaptive test assembly.
The development of a standardized editor enabled automated checking of codebooks and test definition for multistage tests.
A generic, study-independent concept was developed for coding missing responses taking into account indicators from log data.
Prerequisites for implementing psychometrically sophisticated test designs were prepared, such as adaptive algorithms. The TBA Centre thus developed an infrastructure to configure CAT algorithms for test development from R. These was tested in simulation studies which have been operatively integrated into the delivery software.
Following the paradigm of economy, result data and log data were not processed in parallel. Instead, result data has been processed on the basis of log data. To this end, criteria had been developed for defining the completeness of log data (cf. Kroehne & Goldhammer, 2018). These developments were used for creating generic tools, to enable reproducible and transparent data processing.

Selected Publications:

Kroehne, U. & Goldhammer, F. (2018). How to conceptualize, represent, and analyze log data from technology-based assessments? A generic framework and an application to questionnaire items. Behaviormetrika, 45(2), 527–563. https://doi.org/10.1007/s41237-018-0063-y
Deribo, T., Goldhammer, F. & Kröhne, U. (2022). Changes in the speed-ability relation through different treatments of rapid guessing. Educational and Psychological Measurement, online first. doi: 10.1177/00131644221109490
Deribo, T., Kröhne, U. & Goldhammer, F. (2021). Model‐based treatment of rapid guessing. Journal of Educational Measurement, 58(2), 281-303. doi: 10.1111/jedm.12290
Kröhne, U., Deribo, T. & Goldhammer, F. (2020). Rapid guessing rates across administration mode and test setting. Psychological Test and Assessment Modeling, 62(2), 144-177. doi: 10.25656/01:23630
Kroehne, U. & Goldhammer, F. (2018). How to conceptualize, represent, and analyze log data from technology-based assessments? A generic framework and an application to questionnaire items. Behaviormetrika, 45(2), 527-563. doi: 10.1007/s41237-018-0063-y
Engelhardt, L., Goldhammer, F., Naumann, J., & Frey, A. (2017). Experimental validation strategies for heterogeneous computer-based assessment items. Computers in Human Behavior, 76(11), 683-692. doi: 10.1016/j.chb.2017.02.020

Completed Project Phase 2014-2017

For the domains reading, mathematics, science, and ICT literacy, which are surveyed multiple times in the longitudinal NEPS, changes in the measurement instruments as a result of computerization were psychometrically explored based on combined mode-effect and link studies as well as with the help of experimental mode variation (see e.g. Buerger, Kroehne & Goldhammer, 2016). For this purpose, such procedures of quantifying and correcting mode effects were investigated and applied to enable the introduction of computer-based competency testing in NEPS. Research and development in this project phase focused on the use of properties of technology-based testing for the further development and optimization of NEPS competency tests (e.g., testing multiple highlighting as a response format).

For in-depth research on mode and setting effects, for example, a procedure for log data collection in paper-based testing has been developed at TBA and has been used in selected NEPS studies (see, e.g., Kroehne & Goldhammer, 2018). In this approach, digital ballpoint pens are used to answer the paper-based administered questions in test booklets in which a special dot pattern is is printed (see, among others, Dirk et al, 2017 for a description). While the entries in the test booklet with these digital ballpoint pens are visible to panelists as if they had been made with an ordinary ballpoint pen, the coordinates and timestamps of all the responses are additionally recorded via a Bluetooth-connected computer. This data collection method allows the analysis of answering processes, such as e.g., the comparison of processing times between paper-based and computer-based testing (see, e.g., Kroehne, Hahnel, & Goldhammer,2019).Selected Publications

Kroehne, U., Gnambs, T., & Goldhammer, F. (2019). Disentangling setting and mode effects for online competence assessment. In H.-P. Blossfeld & H.-G. Roßbach (Hrsg.), Education as a lifelong process (2. Aufl.). Wiesbaden, Germany: Springer VS. doi: 10.1007/978-3-658-23162-0
Buerger, S., Kroehne, U., Köhler, C. & Goldhammer, F. (2019). What makes the difference? The impact of item properties on mode effects in reading assessments. Studies in Educational Evaluation, 62, 1-9. doi: 10.1016/j.stueduc.2019.04.005
Kroehne, U., Hahnel, C. & Goldhammer, F. (2019). Invariance of the response processes between gender and modes in an assessment of reading. Frontiers in Applied Mathematics and Statistics, 5:2. doi: 10.3389/fams.2019.00002
Kroehne, U., Buerger, S., Hahnel, C. & Goldhammer, F. (2019). Construct equivalence of PISA reading comprehension measured with paper‐based and computer‐based assessments. Educational Measurement, 38(3), 97-111. doi: 10.1111/emip.12280
Dirk, J., Kratzsch, G. K., Prindle, J. P., Kroehne, U., Goldhammer, F., & Schmiedek, F. (2017). Paper-Based Assessment of the Effects of Aging on Response Time: A Diffusion Model Analysis. Journal of Intelligence, 5(2), 12. doi: 10.3390/jintelligence5020012
Buerger, S., Kroehne, U., & Goldhammer, F. (2016). The Transition to Computer-Based Testing in Large-Scale Assessments: Investigating (Partial) Measurement Invariance between Modes. Psychological Test and Assessment Modeling, 58(4), 597-616.
Goldhammer, F., & Kroehne, U. (2014). Controlling Individuals’ Time Spent on Task in Speeded Performance Measures: Experimental Time Limits, Posterior Time Limits, and Response Time Modeling. Applied Psychological Measurement, 38(4), 255–267. doi: 10.1177/0146621613517164

Completed Project Phase 2009-2013

In the project phase 2009 until 2013, preparatory to the work package Technology Based Testing (TBT), DIPF was responsible for the following tasks:

NEPS AP 13 B

The software development for a data warehouse was located at the TBA centre; the data warehouse ensures a data access as fast as possible while considering privacy data protection.
The development of a data warehouse was supposed to guarantee a central data stock for the entire NEPS study and to provide appropriate tools for filtering as well as report production.
Data Warehouse: Three proceeding processes of parallel software developed during the project duration: (1) implementation, optimisation and development of data banks, (2) implementation, optimisation and further development of ETL and reporting tools and (3) implementation, optimisation and further development of a web portal.
With the Data Warehouse the data from the four waves of assessment as well as the tools for filtering and report production should be accessible for the whole scientific community subsequently.

NEPS AP 13 C

For the preparation of the electronic test assessment empirical assessments had been conveyed to identify possible differences between paper-based and computer-based testing (quantification of mode effects) and to investigate computer-based testing results (cross mode linking).
Mode effect studies (equivalence studies combined with linking studies of NEPS) have been conducted to prepare a test execution on a technological basis. The mode effect studies aimed at testing the equivalence of paper-based assessments (PBA) and computer-based assessment (CBA) by means of different criteria. The organisation and execution of mode effect studies was carried out with pillar 1 (Competence development in the life course).

Selected Publications:

Kroehne, U., & Martens, T. (2011). Computer-based competence tests in the national educational panel study: The challenge of mode effects. Zeitschrift Für Erziehungswissenschaft, 14(S2), 169– doi: 10.1007/s11618-011-0185-4
Rölke, H. (2012). The ItemBuilder: A Graphical Authoring System for Complex Item Development. In T. Bastiaens & G. Marks (Hrsg.), Proceedings of E-Learn: World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 1 (S. 344-353). Chesapeake, VA: AACE.

Funding

Cooperations

This project is being carried out in cooperation with...

Publications

Project Management

Project Team

Project Details

Status:	Current project
Area of Focus	Education in the Digital World
Department:	Teacher and Teaching Quality
Unit:	Technology-Based Assessment
Education Sectors:	Extracurricular Learning, Higher Education, Primary and Secondary Education
Duration:	01/2023 – 12/2027
Funding:	External funding