Missing by Design Patterns for Optimizing Survey Response by Efficient and Consistent Data Collection

Bahrami, Sara

doi:10.20378/irb-49487

Faculty/Chair:

University of Bamberg

Fakultät Sozial- und Wirtschaftswissenschaften: Abschlussarbeiten

Author:

Bahrami, Sara

Corporate Body:

Otto-Friedrich-Universität Bamberg

Publisher Information:

Bamberg : Otto-Friedrich-Universität

Year of publication:

2021

Pages:

XIV, 109 ; Illustrationen

Supervisor:

Aßmann, Christian

;

Schmid, Timo

;

Engelhardt-Wölfler, Henriette

Year of first publication:

2020

Language:

English

Remark:

Dissertation, Otto-Friedrich-Universität Bamberg, 2020

DOI:

10.20378/irb-49487

Licence:

Creative Commons - CC BY - Attribution 4.0 International

URN:

urn:nbn:de:bvb:473-irb-494874

Abstract:

Respondent burden due to long questionnaires in surveys can negatively affect the response rate as well as the quality of responses. A solution to this problem is to use split questionnaire design (SQD). In an SQD, the items of the long questionnaire are divided into subsets and only a fraction of item-subsets are assigned to random subsamples of individuals. This will lead to several shorter questionnaires which are administered to random subsample of individuals. The completed sub-questionnaires are then combined and the missing values due to design are imputed by means of multiple imputation method. Identification problems can be avoided in advance by ensuring that the combination of variables in the analysis model of interest are jointly observed on at least a subsample of individuals. Furthermore, including an appropriate combination of items in each sub-questionnaire is the most important concern in designing the SQD to reduce the information loss, i.e. highly correlated items that explain each other well should not be jointly missing. For this reason, training data must be available from previous surveys or a pilot study to exploit the association between the variables.

In this thesis two SQDs are proposed. In the first study a potential design for NEPS data is introduced. The data consist of items which can be divided and allocated into blocks according to their context, with the objective that the within block correlations are higher relative to the between block correlations. According to the design, the target sample is divided to subsamples. In addition to the items of a whole block which is assigned to each subsample, a fraction of items of the remaining blocks are randomly drawn and assigned to each subsample. Where items that belong to blocks with relatively higher correlations are drawn with lower probability. The design is evaluated by means of several ex-post investigations. The design is imposed on complete data and several models are estimated for both complete data and data deleted by design. The design is also compared with a random multiple matrix sampling design which assigns random subset of items to each sample individual.

In the second study, a genetic algorithm is used to search among a vast number of SQDs to find the optimal design. The algorithm evaluates the designs by the fraction of missing information (FMI) induced by the design. The optimal design is the one with the smallest FMI. The optimal design is evaluated by means of several simulation studies and is compared with a random MMS design.

GND Keywords:

Datenerhebung

;

Stichprobe

;

Fehlende Daten

;

Nationales Bildungspanel

Keywords:

Planned missing design, Split questionnaire survey design, Multiform design, Multiple imputation, Multiple matrix sampling design, Missing data patterns, Optimizing survey response, Genetic algorithm

DDC Classification:

330 Economics

;

310 Statistics

RVK Classification:

QH 235

Type:

Doctoralthesis

URI:

https://fis.uni-bamberg.de/handle/uniba/49487

Activation date:

March 23, 2021