This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognizing you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Innovative Data Integration Methodology Developed With Multilevel Modeling
February 8, 2022
Westat researchers tackle an increasingly common problem: the need for granular statistics on a set of variables partially collected in distinct studies to assist in answering scientific and policy questions. In new research published in The Canadian Journal of Statistics, they disclose a novel way to integrate data: Statistical data integration using multilevel models to predict employee compensation.
Coauthors Andreea Erciulescu, Ph.D., Jean Opsomer, Ph.D., and Benjamin J. Schneider, M.S., propose a statistical model that integrates 2 surveys’ key data on employee compensation as an example of their approach. The proposed model reconciles the surveys’ estimates for the common variable and uses the relationship between the 2 variables of interest to enable estimation of both variables’ means for all of the subgroups represented in at least one of the surveys. The precision of each variable’s estimate in each subgroup is improved by “borrowing strength” from data available for the other variable and from data available in other subgroups.
This work bridges the statistical fields of data integration and small area estimation. While this work is motivated by a statistical data integration problem, it builds upon small area estimation approaches developed to yield precise survey estimates for population subgroups with small sample sizes. The proposed model also brings a novel contribution to the proposed small area estimation, using unmatched sets of population subgroups in the model specification. This work represents an innovative synthesis of methods from these 2 fields and has the potential to improve official statistics related to employee compensation.
“Combining the information from the 2 surveys is necessary to produce a unique set of statistics for each of the 2 variables for all the domains of interest,” notes Dr. Erciulescu. “The multilevel model we introduce does this, and the application demonstrates how the availability of employee compensation at a granular level is improved.”