Bayesian estimation of latent trait distributions considering hierarchical structures and partially missing covariate data

Large-scale studies in social sciences often involve the measurement of latent constructs and seek to investigate their relationship with additional variables in subsequent analyses. Within this context the analyst has to face three problems: First, there is uncertainty through the particular indicators which measure the trait of interest. Second, large-scale studies typically exhibit hierarchical structures caused by sampling design or a composite population consisting of clustered observations. Third, uncertainty arises due to the presence of missing values in covariates related to the latent construct. This thesis provides a Bayesian estimation strategy that simultaneously addresses all three issues. I start out with the class of latent regression item response models, which combine the fields of measurement models and structural analysis, and develop a novel algorithm based on the device of data augmentation. Binary and ordered polytomous items can both be included in the analysis. Population heterogeneity is taken into account either through multigroup, finite mixture or random intercept specifications. Sampling from the posterior distribution of parameters is enriched by sampling from the full conditional distributions of missing values in person covariates. Approximations for the distributions of missing values are constructed from classification and regression trees, thus allowing for high flexibility in the incorporation of metric as well as categorical variables and nonlinear relationships. The validity of the proposed strategy is evaluated with respect to statistical accuracy by two simulation studies controlling the missing data generating mechanism. I show that the novel algorithm is capable of recovering all involved parameters in each of the two scenarios and clearly outperforms stochastic regression imputation and complete cases analysis. Two illustrations using data from the National Educational Panel Study on mathematical abilities and eating disorders of ninth grade students demonstrate the empirical usefulness of the method. Finally, I introduce an R package which implements the estimation routines presented in the thesis.

GND Keywords:

Markov-Modell

;

Monte-Carlo-Simulation

;

Probabilistische Testtheorie

;

Fehlende Daten

;

Bevölkerung

;

Heterogenität

;

Statistik

;

Datenverarbeitung

Keywords:

item response theory

;

population heterogeneity

;

Markov chain Monte Carlo

;

multiple imputation

;

statistical computing

DDC Classification:

310 Statistics

RVK Classification:

QH 239

QH 250

QH 235

Type:

Doctoralthesis

URI:

https://fis.uni-bamberg.de/handle/uniba/42632

Activation date:

December 1, 2017

Permalink https://fis.uni-bamberg.de/handle/uniba/42632

FIS

Full text/File(s)

Question on publication

Options

Full text/File(s)

Question on publication