Discovery Files

Democratizing data: Discovering data use and value for research and policy

Harvard Data Science Review spotlights NCSES-supported pilot

Data access and discovery continue to be topics of great interest across the federal data ecosystem. The National Center for Science and Engineering Statistics (NCSES) within the U.S. National Science Foundation supported a pilot study seeking to understand how public data are being used in research publications, possibly enabling the government to make more transparent, informed decisions about public investments. This project, the "Democratizing Data Search and Discovery Platform," was the subject of a recent special issue of the Harvard Data Science Review, which highlighted the findings, successes and lessons learned from the pilot and explored the potential of technology and artificial intelligence to build on these initial efforts. 

As part of the special issue, NCSES Director Emilda Rivers and Science Advisor May Aydin co-authored an article that discusses the potential for using AI to classify data sets, a particular challenge at NCSES, where data labels do not always map to the agency's mission categories. Natural language processing, semantic analysis and machine learning could potentially be used to develop different sets of metadata topic labels for different needs.

Rivers, alongside the three other agency heads who participated in the pilot study, sat down for a fireside chat, hosted by former U.S. Chief Statistician Nancy Potok, to explore the significance of understanding data used to inform decision-making. 

"It's very important for us that we're able to talk about where our data are being used and how they are meeting the needs of not only NCSES, but the National Science Board and the National Science Foundation," Rivers said during the fireside chat. "[Data from the pilot] can provide a lot of insights into the type of audiences that both we reach and those that we don't."