Multiple Imputation via Local Regression (Miles)

Faculty/Professorship: Fakultät Sozial- und Wirtschaftswissenschaften: Abschlussarbeiten 
Author(s): Gaffert, Philipp
Publisher Information: Bamberg : opus
Year of publication: 2017
Pages: xiii, 73 ; Illustrationen, Diagramme
Supervisor(s): Rässler, Susanne
Language(s): English
Dissertation, Otto-Friedrich-Universität Bamberg, 2017
DOI: 10.20378/irbo-49884
Licence: German Act on Copyright 
URN: urn:nbn:de:bvb:473-opus4-498847
Methods for statistical analyses generally rely upon complete rectangular data sets. When the data are incomplete due to, e.g. nonresponse in surveys, the researcher must choose between three alternatives:

1. The analysis rests on the complete cases only: This is almost always the worst option. In, e.g. market research, missing values occur more often among younger respondents. Because relevant behavior such as media consumption or past purchases often correlates with age, a complete case analysis provides the researcher with misleading answers.
2. The missing data are imputed (i.e., filled in) by the application of an ad-hoc method: Ad-hoc methods range from filling in mean values to applying nearest neighbor techniques. Whereas filling in mean values performs poorly, nearest neighbor approaches bear the advantage of imputing plausible values and work well in some applications. Yet, ad-hoc approaches generally suffer from two limitations: they do not apply to complex missing data patterns, and they distort statistical inference, such as t-tests, on the completed data sets.
3. The missing data are imputed by the application of a method that is based on an explicit model: Such model-based methods can cope with the broadest range of missing data problems. However, they depend on a considerable set of assumptions and are susceptible to their violations.

This dissertation proposes the two new methods and that build on ideas by Cleveland & Devlin (1988) and Siddique & Belin (2008). Both these methods combine model-based imputation with nearest neighbor techniques. Compared to default model-based imputation, these methods are as broadly applicable but require fewer assumptions and thus hopefully appeal to practitioners. In this text, the proposed methods' theoretical derivations in the multiple imputation framework (Rubin, 1987) precede their performance assessments using both artificial data and a natural TV consumption data set from the GfK SE company. In highly nonlinear data, we observe outperform alternative methods and thus recommend its use in applications.
GND Keywords: Datenerhebung ; Fehlende Daten ; Regressionsanalyse
Keywords: Multiple Imputation, Predictive Mean Matching, Sequential Regressions, Local Regression, Distance-Aided Donor Selection
DDC Classification: 310 Statistics  
RVK Classification: QH 235   
Type: Doctoralthesis
Year of publication: 29. November 2017

File SizeFormat  
Gaffert_Dissopuskse_A2b.pdf2.06 MBPDFView/Open