Missing by Design Patterns for Optimizing Survey Response by Efficient and Consistent Data Collection

Faculty/Professorship: University of Bamberg  ; Fakultät Sozial- und Wirtschaftswissenschaften: Abschlussarbeiten 
Author(s): Bahrami, Sara
Corporate Body: Otto-Friedrich-Universität Bamberg
Publisher Information: Bamberg : Otto-Friedrich-Universität
Year of publication: 2021
Pages: XIV, 109 ; Illustrationen
Supervisor(s): Aßmann, Christian  ; Schmid, Timo; Engelhardt-Wölfler, Henriette
Year of first publication: 2020
Language(s): English
Dissertation, Otto-Friedrich-Universität Bamberg, 2020
DOI: 10.20378/irb-49487
Licence: Creative Commons - CC BY - Attribution 4.0 International 
URN: urn:nbn:de:bvb:473-irb-494874
Respondent burden due to long questionnaires in surveys can negatively affect the response rate as well as the quality of responses. A solution to this problem is to use split questionnaire design (SQD). In an SQD, the items of the long questionnaire are divided into subsets and only a fraction of item-subsets are assigned to random subsamples of individuals. This will lead to several shorter questionnaires which are administered to random subsample of individuals. The completed sub-questionnaires are then combined and the missing values due to design are imputed by means of multiple imputation method. Identification problems can be avoided in advance by ensuring that the combination of variables in the analysis model of interest are jointly observed on at least a subsample of individuals. Furthermore, including an appropriate combination of items in each sub-questionnaire is the most important concern in designing the SQD to reduce the information loss, i.e. highly correlated items that explain each other well should not be jointly missing. For this reason, training data must be available from previous surveys or a pilot study to exploit the association between the variables.

In this thesis two SQDs are proposed. In the first study a potential design for NEPS data is introduced. The data consist of items which can be divided and allocated into blocks according to their context, with the objective that the within block correlations are higher relative to the between block correlations. According to the design, the target sample is divided to subsamples. In addition to the items of a whole block which is assigned to each subsample, a fraction of items of the remaining blocks are randomly drawn and assigned to each subsample. Where items that belong to blocks with relatively higher correlations are drawn with lower probability. The design is evaluated by means of several ex-post investigations. The design is imposed on complete data and several models are estimated for both complete data and data deleted by design. The design is also compared with a random multiple matrix sampling design which assigns random subset of items to each sample individual.

In the second study, a genetic algorithm is used to search among a vast number of SQDs to find the optimal design. The algorithm evaluates the designs by the fraction of missing information (FMI) induced by the design. The optimal design is the one with the smallest FMI. The optimal design is evaluated by means of several simulation studies and is compared with a random MMS design.
GND Keywords: Datenerhebung; Stichprobe; Fehlende Daten; Nationales Bildungspanel
Keywords: Planned missing design, Split questionnaire survey design, Multiform design, Multiple imputation, Multiple matrix sampling design, Missing data patterns, Optimizing survey response, Genetic algorithm
DDC Classification: 330 Economics  
310 Statistics  
RVK Classification: QH 235   
Type: Doctoralthesis
URI: https://fis.uni-bamberg.de/handle/uniba/49487
Release Date: 23. March 2021

File Description SizeFormat  
fisba49487_A3a.pdf4.28 MBPDFView/Open