Insights
Expert Interview

Leveraging Paradata and AI to Improve Survey Participation Rates

January 6, 2025

The steady decline in survey response rates has been a major concern for many researchers for some time. Low response rates not only erode the quality of surveys and increase nonresponse bias but also affect costs. In search of a solution, Westat embarked on a study last year for the Agency for Healthcare Research and Quality (AHRQ) to identify viable contact strategies by harnessing paradata with machine learning (ML). Westat’s Gizem Korkmaz, PhD, Associate Vice President, provided guidance to the data science team and oversight of the study. In late October, the study’s findings were presented at the 2024 Federal Committee on Statistical Methodology (FCSM) Research and Policy Conference in Maryland. Here, Korkmaz discusses the potential, challenges, and future of combining paradata with ML to address declining survey response rates.

Q. What role does paradata play in survey data collection and why are they increasingly important in this era of declining response rates?

A. Paradata are process data generated during survey data collection. They include a vast amount of information on when and how sampled persons are contacted and the outcome of each contact attempt. If analyzed effectively, paradata combined with ML can help us identify strategic ways to contact households and improve response rates. Specifically, these analyses can help us narrow down the time of day or week and the successful contact modes used to reach each survey participant, based on the individual’s demographic characteristics or where they live—for example, whether they live in an urban or rural area.

Q. What are some challenges in using paradata to develop adaptive data collection strategies, especially in a multi-contact approach across various modes?

A. The major challenge in using paradata is managing their complexity, as they reflect variability in contact patterns and outcomes. The lack of standard methods for analysis adds to the difficulty. So, we have to be very careful in selecting the appropriate paradata, running models to analyze them, and then transferring our results to identify new contact strategies.

One key challenge for us is effectively interpreting the data. As data scientists, my team brings technical expertise, but we rely on collaboration with survey and field experts to contextualize the observations. Their insights are essential in interpreting the data and findings and ensuring that any recommended changes to contact strategies are operationally practical and feasible. This process requires an interdisciplinary collaborative team effort.

Q. How does applying ML techniques, such as clustering and decision trees, help identify and understand paradata patterns?

A. When we apply unsupervised ML methods, such as clustering, to automatically group households based on their paradata and then combine them with data visualization techniques, we can observe patterns for efficient contact strategies, especially among demographically different households.

To enhance the effectiveness of ML techniques, it’s important to also include external data, such as information from the American Community Survey. These data sources can provide context about survey participants, such as income distribution, education levels, and other socioeconomic and demographic factors.

Q. What are some of the benefits for clients in using these ML-driven insights to inform their contact strategies?

A. Contact strategies informed by ML insights are more likely to work in real-world conditions, improve response rates, and decrease data collection costs. ML models can also quickly, efficiently, and more accurately identify patterns, saving time and resources for clients.

Q. How does this research demonstrate Westats leadership and innovation in data collection methodology, particularly in adaptive survey designs?

A. Through our leadership, innovation, and collaboration among expert data scientists, survey methodologists, and field operations staff, we successfully identified potential solutions. In the spring, we will be evaluating our work for the Medical Expenditure Panel Survey (MEPS) to see how the ML models’ predictions on best times to contact households affected response rates.

Q. Looking ahead, what other potential applications do you foresee for ML and paradata analysis?

A. The use of ML, and AI in general, holds great promise for many aspects of survey methodology from designing questionnaires and automating response coding responses to developing open-ended questions, assessing response quality, and disseminating data products. However, human oversight will remain essential to ensure the validity of the statistics produced.

Insights

Deep Dive with Our Experts

view all insights

How can we help?

We welcome messages from job seekers, collaborators, and potential clients and partners.

Get in Contact

Want to work with us?

You’ll be in great company.

Explore Careers
Back to Top