Missing by Design Patterns for Optimizing Survey Response by Efficient and Consistent Data Collection
|Professorship/Faculty:||University of Bamberg ; Fakultät Sozial- und Wirtschaftswissenschaften: Abschlussarbeiten|
|Corporate Body:||University of Bamberg|
|Publisher Information:||Bamberg : Otto-Friedrich-Universität|
|Year of publication:||2021|
|Pages:||XIV, 109 ; Illustrationen|
|Supervisor(s):||Aßmann, Christian ; Schmid, Timo; Engelhardt-Wölfler, Henriette|
|Year of first publication:||2020|
Dissertation, Otto-Friedrich-Universität Bamberg, 2020
|Licence:||Creative Commons - CC BY - Attribution 4.0 International|
Respondent burden due to long questionnaires in surveys can negatively affect the response rate as well as the quality of responses. A solution to this problem is to use split questionnaire design (SQD). In an SQD, the items of the long questionnaire are divided into subsets and only a fraction of item-subsets are assigned to random subsamples of individuals. This will lead to several shorter questionnaires which are administered to random subsample of individuals. The completed sub-questionnaires are then combined and the missing values due to design are imputed by means of multiple imputation method. Identification problems can be avoided in advance by ensuring that the combination of variables in the analysis model of interest are jointly observed on at least a subsample of individuals. Furthermore, including an appropriate combination of items in each sub-questionnaire is the most important concern in designing the SQD to reduce the information loss, i.e. highly correlated items that explain each other well should not be jointly missing. For this reason, training data must be available from previous surveys or a pilot study to exploit the association between the variables.
In this thesis two SQDs are proposed. In the first study a potential design for NEPS data is introduced. The data consist of items which can be divided and allocated into blocks according to their context, with the objective that the within block correlations are higher relative to the between block correlations. According to the design, the target sample is divided to subsamples. In addition to the items of a whole block which is assigned to each subsample, a fraction of items of the remaining blocks are randomly drawn and assigned to each subsample. Where items that belong to blocks with relatively higher correlations are drawn with lower probability. The design is evaluated by means of several ex-post investigations. The design is imposed on complete data and several models are estimated for both complete data and data deleted by design. The design is also compared with a random multiple matrix sampling design which assigns random subset of items to each sample individual.
In the second study, a genetic algorithm is used to search among a vast number of SQDs to find the optimal design. The algorithm evaluates the designs by the fraction of missing information (FMI) induced by the design. The optimal design is the one with the smallest FMI. The optimal design is evaluated by means of several simulation studies and is compared with a random MMS design.
|SWD Keywords:||Datenerhebung ; Stichprobe ; Fehlende Daten ; Nationales Bildungspanel|
|Keywords:||Planned missing design, Split questionnaire survey design, Multiform design, Multiple imputation, Multiple matrix sampling design, Missing data patterns, Optimizing survey response, Genetic algorithm|
|DDC Classification:||330 Economics |
|RVK Classification:||QH 235|
|Release Date:||23. March 2021|
originated at the
University of Bamberg
University of Bamberg