Analyzing the context of large-scale educational assessments using multilevel latent variable modeling

Large-scale assessments in education often involve the measurement of latent competence, with the aim to perform comparisons between groups in subsequent analyses. The assessment of latent competence is thereby often performed within the institutionalized context of schools, or with the help of interviewers visiting respondents’ homes. In addition, sample selection for educational assessments might focus on groups as sampling units. For example, German secondary school types are used as primary sampling units in a multi-stage sampling procedure in educational studies. Subsequently, several schools per school type are selected for student competence assessment and in consequence, students are nested within clusters of schools. Hence, hierarchical data structures in large-scale educational assessments are caused by contexts of assessments, the sampling design or the combination of both. Regarding the resulting measures of latent competence, construct-irrelevant variance could be introduced by the context of educational assessments in the form of variance between clusters. This can potentially reduce the quality of measurement and impair a fair and valid interpretation of test scores. Further uncertainty in latent competence assessment arises by using as set of item indicators to measure the latent trait of interest. Besides the overall latent trait, also single items can vary by assessment contexts, indicated by random item effects that denote item variation across clusters. In addition, item differences can occur between groups (i.e., fixed item effects) and, if severe, indicate measurement non-invariance in the form of differential item functioning between groups. Item bias is indicated when item differences between groups favor a particular group. If item bias is found between groups that were used as sampling units (e.g., school type), item bias is associated with the context of competence assessment. Hence, group differences of items are then found on the cluster-level of the hierarchical data structure.
For the detection of such sources of construct-irrelevant variance stemming from the assessment context, as well as item bias and item variances associated with the assessment context, multilevel latent variable models are presented and applied to competence and cognitive ability assessments. Competence measurements in the domains mathematic and reading are examined, as well as a measure of the cognitive ability perceptual speed, all assessed within the National Educational Panel Study. In the first study, interviewer and area clusters were investigated in an adult mathematic competence assessment, as hierarchical structures that might introduce construct-irrelevant variance. The examination of cross-classified multilevel item response theory models showed substantial interviewer variance in mathematic competence, while area effects were small. Subsequent analyses revealed some interviewers with undue influence, that were in addition associated with the respondents’ number of missing values in the assessed test and participation rates at the subsequent competence assessment. The second study investigated consequences of item bias, in the form of cluster-level group differences in items by school type, for students reading competence development from fifth to ninth grade of German secondary school. Measurement non-invariance occurred especially between the highest and lowest school type of German secondary schools at all measurement occasions. Nevertheless, the school type comparisons of reading competence development were not sensitive to found measurement non-invariance between school types and a parallel development of reading competence between German secondary school types was presented. In the third study, in addition to cluster-level group differences in item estimates by school type (i.e., fixed item effects), also random item effects across school clusters per school type were investigated for three items measuring perceptual speed for German secondary school students at ninth grade. Fixed- and random-group differential item functioning was investigated in comparison of students from several types of regular schools and students with special educational needs. Random-group differential item functioning was found for two out of the three items, indicating that estimated item difficulties differed across school clusters. Such differences across assessment contexts (i.e., school clusters) might stem from problems of standardized test assessment. Finally, the results of the three studies are discussed with regard to test standards for educational and psychological testing. The results are furthermore compared to empirical evaluations of context effects in other educational large-scale assessments.

GND Keywords:

Nationales Bildungspanel

;

Pädagogik

;

Forschung

Keywords:

assessment context, item response theory, multilevel latent variable modeling

;

large-scale educational assessments, National Educational Panel Study

DDC Classification:

370 Education

RVK Classification:

DF 2600

Type:

Doctoralthesis

URI:

https://fis.uni-bamberg.de/handle/uniba/55904

Activation date:

November 8, 2022

Project(s):

Ein Bayesianischer Modellrahmen für die Auswertung von Daten aus längsschnittlichen Large-scale Assessments

Permalink https://fis.uni-bamberg.de/handle/uniba/55904

FIS

Full text/File(s)

Question on publication

Options

Full text/File(s)

Question on publication