Options
Recent Advances in Small Area Estimation of Economic and Poverty Indicators using Traditional and Alternative Data
Lee, Yeonjoo (2024): Recent Advances in Small Area Estimation of Economic and Poverty Indicators using Traditional and Alternative Data, Bamberg: Otto-Friedrich-Universität, doi: 10.20378/irb-103063.
Author:
Publisher Information:
Year of publication:
2024
Pages:
Supervisor:
Language:
English
Remark:
Kumulative Dissertation, Otto-Friedrich-Universität Bamberg, 2024
DOI:
Abstract:
Chapter 1 - Variable selection using conditional AIC for linear mixed models with data-driven transformations
When data analysts use linear mixed models, they usually encounter two practical problems: a) the true model is unknown and b) the Gaussian assumptions of the errors do not hold. While these problems commonly appear together, researchers tend to treat them individually by a) finding an optimal model based on the conditional Akaike information criterion (cAIC) and b) applying transformations on the dependent variable. However, the optimal model depends on the transformation and vice versa. In this paper, we aim to solve both problems simultaneously. In particular, we propose an adjusted cAIC by using the Jacobian of the particular transformation such that various model candidates with differently transformed data can be compared. From a computational perspective, we propose a step-wise selection approach based on the introduced adjusted cAIC. Model-based simulations are used to compare the proposed selection approach to alternative approaches. Finally, the introduced approach is applied to Mexican data to estimate poverty and inequality indicators for 81 municipalities.
When data analysts use linear mixed models, they usually encounter two practical problems: a) the true model is unknown and b) the Gaussian assumptions of the errors do not hold. While these problems commonly appear together, researchers tend to treat them individually by a) finding an optimal model based on the conditional Akaike information criterion (cAIC) and b) applying transformations on the dependent variable. However, the optimal model depends on the transformation and vice versa. In this paper, we aim to solve both problems simultaneously. In particular, we propose an adjusted cAIC by using the Jacobian of the particular transformation such that various model candidates with differently transformed data can be compared. From a computational perspective, we propose a step-wise selection approach based on the introduced adjusted cAIC. Model-based simulations are used to compare the proposed selection approach to alternative approaches. Finally, the introduced approach is applied to Mexican data to estimate poverty and inequality indicators for 81 municipalities.
Chapter 2 - Estimation of the consumer price index with regional weights using small area estimation methods: A case study of Germany
The consumer price index (CPI) is an important indicator for formulating effective policies related to wage and inflation control. Most countries regularly produce the national CPI and some countries also publish the CPI at sub-national levels. This sub-national (or regional) CPI depicts region specific information and consequently is a helpful tool for local policy makers. Germany provides this regional CPI for all 16 states with national product weights. Still, national products weights do not adequately represent the importance of products at the regional level, whereas a better regional CPI uses regional product weights to reflect regional specifics more accurately. In this study, I explore the estimation of such a better regional CPI with regional weights by using an accessible income and consumption survey from Germany. To obtain reliable regional weights, I focus on estimating regional expenditures for each product. Regional weights of each product are derived by calculating the proportion of the estimated specific product expenditure over the total expenditure. However, estimating regional expenditures is challenging because of the small sample size in each region. Small sample sizes potentially produce unreliable estimates. To address this problem, I propose the use of a small area estimation approach based on multivariate Fay-Herriot (MFH) models. By using MFH models, I show how model-based estimation of regional expenditure improves the reliability of the estimation using the case of Germany and further discuss the limitations as well as future research directions.
The consumer price index (CPI) is an important indicator for formulating effective policies related to wage and inflation control. Most countries regularly produce the national CPI and some countries also publish the CPI at sub-national levels. This sub-national (or regional) CPI depicts region specific information and consequently is a helpful tool for local policy makers. Germany provides this regional CPI for all 16 states with national product weights. Still, national products weights do not adequately represent the importance of products at the regional level, whereas a better regional CPI uses regional product weights to reflect regional specifics more accurately. In this study, I explore the estimation of such a better regional CPI with regional weights by using an accessible income and consumption survey from Germany. To obtain reliable regional weights, I focus on estimating regional expenditures for each product. Regional weights of each product are derived by calculating the proportion of the estimated specific product expenditure over the total expenditure. However, estimating regional expenditures is challenging because of the small sample size in each region. Small sample sizes potentially produce unreliable estimates. To address this problem, I propose the use of a small area estimation approach based on multivariate Fay-Herriot (MFH) models. By using MFH models, I show how model-based estimation of regional expenditure improves the reliability of the estimation using the case of Germany and further discuss the limitations as well as future research directions.
Chapter 3 - Small area estimation using geospatial data based on transformed two-fold nested error regression models
Fighting poverty starts with identifying where exactly poverty is the most severe by estimating poverty indicators. For a precise estimation of indicators at disaggregated regional levels, a small area estimation (SAE) approach is essential. SAE methods require a survey and a auxiliary dataset. By combining two datasets, SAE methods overcome the problem of small sample sizes and produce reliable estimates at a small area level. A census dataset is commonly used as auxiliary data to estimate poverty indicators. However, data protection laws often impede access and even a recent census may already be outdated because in developing countries, changes are as quick as they are dynamic. Therefore, an older census is inappropriate as auxiliary data. Geospatial data is a viable alternative since it is freely available, up to date, and covers all inhabited areas. Still, a central challenge remains: geospatial data is collected at grid level which is usually larger than a household but smaller than a small area. Since traditional SAE models use either a household level model or an area level model, we need to investigate which SAE model optimizes the advantages of grid level data. We suggest the two-fold nested regression model as it allows two random effects at different regional levels. More random effects capture the hierarchical data structure more accurately than standard SAE models. Additionally, we introduce transformations to the two-fold model to correct the violation in distributional model assumptions and to estimate ratio type indicators. We furthermore propose an estimation method for mean squared errors (MSE) with the transformed two-fold model. We show an efficiency gain of the two-fold model compared to the standard Fay-Herriot model, and the proposed MSE estimator is reasonable. Lastly, we apply the proposed transformed two-fold model to the geospatial data of Mozambique to estimate poverty indicators for 161 districts.
Fighting poverty starts with identifying where exactly poverty is the most severe by estimating poverty indicators. For a precise estimation of indicators at disaggregated regional levels, a small area estimation (SAE) approach is essential. SAE methods require a survey and a auxiliary dataset. By combining two datasets, SAE methods overcome the problem of small sample sizes and produce reliable estimates at a small area level. A census dataset is commonly used as auxiliary data to estimate poverty indicators. However, data protection laws often impede access and even a recent census may already be outdated because in developing countries, changes are as quick as they are dynamic. Therefore, an older census is inappropriate as auxiliary data. Geospatial data is a viable alternative since it is freely available, up to date, and covers all inhabited areas. Still, a central challenge remains: geospatial data is collected at grid level which is usually larger than a household but smaller than a small area. Since traditional SAE models use either a household level model or an area level model, we need to investigate which SAE model optimizes the advantages of grid level data. We suggest the two-fold nested regression model as it allows two random effects at different regional levels. More random effects capture the hierarchical data structure more accurately than standard SAE models. Additionally, we introduce transformations to the two-fold model to correct the violation in distributional model assumptions and to estimate ratio type indicators. We furthermore propose an estimation method for mean squared errors (MSE) with the transformed two-fold model. We show an efficiency gain of the two-fold model compared to the standard Fay-Herriot model, and the proposed MSE estimator is reasonable. Lastly, we apply the proposed transformed two-fold model to the geospatial data of Mozambique to estimate poverty indicators for 161 districts.
GND Keywords: ; ; ; ; ;
Armut
Indikator
Wirtschaftsindikator
Regionale Disparität
Schätzung
Statistische Analyse
Keywords: ; ; ; ; ; ; ; ; ; ; ; ; ;
Box-Cox transformation
Empirical best predictor
Indicators
Small area estimation
inflation
regional inequality
regional difference
price statistics
household consumption
head count ratio
Mozambique
satellite imagery
remote sensing data
subarea level model
DDC Classification:
RVK Classification:
Type:
Doctoralthesis
Activation date:
October 25, 2024
Permalink
https://fis.uni-bamberg.de/handle/uniba/103063