This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognizing you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Many research and evaluation questions can be answered using extant data from surveys, program evaluations, and administrative data. Data catalogs help support the use of extant data by providing researchers with an overview of different datasets and information about how to access them. Developing and maintaining data catalogs requires considerable time and resources to ensure they are thorough and accurate.
Our team from Westat and the American Institutes for Research (AIR), on behalf of the U.S. Department of Labor (DOL), explored the potential of automated machine learning (ML) algorithms to create and maintain data catalogs for employment and training outcomes. We conducted a literature review, piloted a manual data catalog assembly process, and consulted with a technical working group of computer science experts.
While current ML innovations present challenges for automating data catalog development, there are promising solutions. Data sources vary in their structures and the amount of metadata available, making it difficult to develop an automation program. Moreover, employing automations for even a portion of the data catalog development process would require large investments in staff and computing resources. However, artificial intelligence (AI) is a rapidly evolving field, and literature suggests there may be many opportunities to support automated data collection in the future, including the potential use of generative AI. Federal agencies might explore and use existing tools to meet their needs.
Our brief, Explorations in Data Innovations: Can Machine Learning Support Data Catalog Development? (PDF) details the data catalog development process, options for automation, and recommendations for future explorations to eventually harness artificial intelligence. When using machine learning to produce public-facing products, Federal agencies may need to use a mix of staff with different skill sets, including data scientists, website developers, experts in cloud computing, and subject matter experts.
Allison Hyra, PhD, Westat Associate Vice President of Social Policy and Economics Research and the project director for this work, summed up the study: “This was a great collaboration with the staff of DOL’s Chief Evaluation Office (CEO) not only because we explored the extent to which new technology can support researchers, but because CEO wanted to share our lessons learned despite finding automated data catalogs are currently infeasible. By engaging the field and identifying next steps, Westat, AIR, and CEO are contributing to realizing a future where we can further democratize access to datasets that answer pressing employment and training information needs.”
Focus Areas
Labor and Workforce DevelopmentCapabilities
Advanced Technologies Data Collection Machine Learning and Artificial IntelligenceFeatured Expert
Allison Hyra
Associate Vice President
-
Perspective
Public Health in Action: Westat APHA 2024 HighlightsNovember 2024
Westat staff made their mark at the 2024 American Public Health Association (APHA) Annual Meeting and Expo, which was held in Minneapolis, Minnesota, October 27-30.…
-
Expert Interview
Timely Data-Driven Solutions for Nursing HomesNovember 2024
The COVID-19 pandemic has had a devastating impact on the nursing home sector, resulting in hundreds of thousands of deaths of residents and staff and…
-
Perspective
Westat Work Shines at 2024 APHSA EMWB ConferenceSeptember 2024
Westat human services experts recently presented at the American Public Human Services Association (APHSA)’s Economic Mobility and Well-Being (EMWB) Conference in Portland, Oregon. At the…