This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognizing you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
The steady decline in survey response rates has been a major concern for many researchers for some time. Low response rates not only erode the quality of surveys and increase nonresponse bias but also affect costs. In search of a solution, Westat embarked on a study last year for the Agency for Healthcare Research and Quality (AHRQ) to identify viable contact strategies by harnessing paradata with machine learning (ML). Westat’s Gizem Korkmaz, PhD, Associate Vice President, provided guidance to the data science team and oversight of the study. In late October, the study’s findings were presented at the 2024 Federal Committee on Statistical Methodology (FCSM) Research and Policy Conference in Maryland. Here, Korkmaz discusses the potential, challenges, and future of combining paradata with ML to address declining survey response rates.
Q. What role does paradata play in survey data collection and why are they increasingly important in this era of declining response rates?
A. Paradata are process data generated during survey data collection. They include a vast amount of information on when and how sampled persons are contacted and the outcome of each contact attempt. If analyzed effectively, paradata combined with ML can help us identify strategic ways to contact households and improve response rates. Specifically, these analyses can help us narrow down the time of day or week and the successful contact modes used to reach each survey participant, based on the individual’s demographic characteristics or where they live—for example, whether they live in an urban or rural area.
Q. What are some challenges in using paradata to develop adaptive data collection strategies, especially in a multi-contact approach across various modes?
A. The major challenge in using paradata is managing their complexity, as they reflect variability in contact patterns and outcomes. The lack of standard methods for analysis adds to the difficulty. So, we have to be very careful in selecting the appropriate paradata, running models to analyze them, and then transferring our results to identify new contact strategies.
One key challenge for us is effectively interpreting the data. As data scientists, my team brings technical expertise, but we rely on collaboration with survey and field experts to contextualize the observations. Their insights are essential in interpreting the data and findings and ensuring that any recommended changes to contact strategies are operationally practical and feasible. This process requires an interdisciplinary collaborative team effort.
Q. How does applying ML techniques, such as clustering and decision trees, help identify and understand paradata patterns?
A. When we apply unsupervised ML methods, such as clustering, to automatically group households based on their paradata and then combine them with data visualization techniques, we can observe patterns for efficient contact strategies, especially among demographically different households.
To enhance the effectiveness of ML techniques, it’s important to also include external data, such as information from the American Community Survey. These data sources can provide context about survey participants, such as income distribution, education levels, and other socioeconomic and demographic factors.
Q. What are some of the benefits for clients in using these ML-driven insights to inform their contact strategies?
A. Contact strategies informed by ML insights are more likely to work in real-world conditions, improve response rates, and decrease data collection costs. ML models can also quickly, efficiently, and more accurately identify patterns, saving time and resources for clients.
Q. How does this research demonstrate Westat’s leadership and innovation in data collection methodology, particularly in adaptive survey designs?
A. Through our leadership, innovation, and collaboration among expert data scientists, survey methodologists, and field operations staff, we successfully identified potential solutions. In the spring, we will be evaluating our work for the Medical Expenditure Panel Survey (MEPS) to see how the ML models’ predictions on best times to contact households affected response rates.
Q. Looking ahead, what other potential applications do you foresee for ML and paradata analysis?
A. The use of ML, and AI in general, holds great promise for many aspects of survey methodology from designing questionnaires and automating response coding responses to developing open-ended questions, assessing response quality, and disseminating data products. However, human oversight will remain essential to ensure the validity of the statistics produced.
-
Expert Interview
Leveraging Paradata and AI to Improve Survey Participation RatesJanuary 2025
The steady decline in survey response rates has been a major concern for many researchers for some time. Low response rates not only erode the…
-
Perspective
Public Health in Action: Westat APHA 2024 HighlightsNovember 2024
Westat staff made their mark at the 2024 American Public Health Association (APHA) Annual Meeting and Expo, which was held in Minneapolis, Minnesota, October 27-30.…
-
Expert Interview
Timely Data-Driven Solutions for Nursing HomesNovember 2024
The COVID-19 pandemic has had a devastating impact on the nursing home sector, resulting in hundreds of thousands of deaths of residents and staff and…