America's DataHub Consortium: seeing — and understanding — the entire elephant
We live in the information age. So where are all the answers? A new data science consortium led by NSF's National Center for Science and Engineering Statistics wants to reveal the answers and evidence hidden in a sea of federally compartmentalized data.
The phrase "data-driven" is a modern cliché. It's generally used to characterize decisions or strategies as being based on some sort of objective data. But was that data actually relevant to the situation at hand, or was it missing something important? Much like the ancient parable of the blind men and the elephant, it's all too easy to rely on data that is incomplete or lacking in context; one person touches the trunk while another touches a tusk and they come away with two very different conclusions about the animal. Both data-driven, neither correct.
By simultaneously examining the entire "data animal" from every angle — trunk to tail — what could be learned? A new partnership of public and private data-research organizations aims to find out. Led by the U.S. National Science Foundation's National Center for Science and Engineering Statistics, or NCSES, America's DataHub Consortium is exploring what is possible when complex data from different sources are linked and analyzed in new ways.
"The secret sauce of America's DataHub is bringing people together who normally would be in their siloed places working on important problems independently," says NCSES Director Emilda Rivers.
That means linking data stored across the federal government, including within its 13 principal statistical agencies, explains Rivers. Each of those agencies individually collects statistical information for their particular area of interest such as health, economics, labor, agriculture, energy and others.
"We're a big country," says Rivers of the broad challenges in collecting critical data about so many diverse aspects of American society. "Our statistical agencies focus on meeting their individual missions and they do it extremely well," she says. "But we need more collaboration."
As the lead agency for America's DataHub, NCSES is building upon its expertise in linking disparate data sources to understand the progress and trajectory of science and engineering in the U.S. and globally.
"The idea behind America's DataHub has a long lineage at NCSES because we do it all the time," says NCSES Deputy Director Vipin Arora. "Getting access to different data sources, bringing them together, linking them to do analysis — this is the kind of work we do on statistical products like the Science and Engineering Indicators and the Women, Minorities, and Persons with Disabilities in Science and Engineering reports."
NCSES's ultimate goal is ambitious: Un-stovepipe the nation's elephant-sized treasure trove of data so that leaders at the federal, state and local level can use it to understand the issues they face and make informed decisions that help their communities and citizens.
Building evidence from a sea of data
It is hard to overstate the value of reliable, useful information when contemplating a difficult decision. That is true for just about everyone, including business owners, entrepreneurs, educators, healthcare providers and members of Congress. In fact, Congress formally recognized the value of data in 2018 with the Foundations for Evidence-Based Policymaking Act. The act required federal agencies to figure out methods and analytical approaches for developing evidence that supports policymaking and to make their data "accessible and useful to the public."
"America's DataHub is about building evidence, writ large," says Arora.
That evidence can be used to identify the best paths to achieve things of immense value like accelerating technological innovation or expanding job opportunities on a local or even national level.
“We’re a big country. Our statistical agencies focus on meeting their individual missions and they do it extremely well. But we need more collaboration.”
- NCSES Director Emilda Rivers
That grand mission comes with comparably grand technical challenges, such as how to access and link myriad sets of statistical data about everything from automobile ownership rates to the average annual income for farmers to community college enrollment numbers.
"The 2018 evidence-based policymaking act has something called the presumption of accessibility," explains Arora. "The idea that statistical agencies can go out and get data from any agency to use. But that's not really been tested. So how do you do that? Where are the choke points? Figuring that out is a big part of the innovation that's going to happen."
The really big questions
To start solving those puzzles, NCSES has identified some complex questions that the consortium partners of America's DataHub are digging into now. The first task is to analyze the availability of and demand for scientists and engineers on a global scale. That includes building evidence to fully understand the public value of recruiting scientists and engineers from other countries and training them in U.S. universities and labs. The consortium of public and private organizations undertaking this project through America's DataHub includes Accenture Federal Services, Clarivate, The Coleridge Initiative, NORC at the University of Chicago, and RTI International.
"Those aren't small questions," says Arora. For example, if the U.S. funds the education and training of foreign-born scientists and engineers, what is the total benefit to U.S. taxpayers now and into the future? How many jobs and new industries would be generated? How many resulting inventions, medical therapies, must-have gizmos and other innovations would be created in the U.S. versus other countries? How many American kids might be inspired to pursue a career in science and engineering?
"This question about foreign-born scientific talent is far-reaching," adds Rivers. "The evidence that America's DataHub is building could have a huge impact, like how we set up our graduate degree programs in the U.S. and the sorts of visa policies we put in place.
"It can also help us broaden participation in STEM and understand where the 'missing millions' are," she says, describing Americans who are not part of the science and engineering workforce because they have not had the necessary educational and professional opportunities.
Ideas + innovation + data geeks
"What excites me is that we're connecting NSF to our citizenry," says Keith Boyea, deputy director of NSF's Division of Acquisition and Cooperative Support and one of many NSF staff members working behind the scenes on the contractual underpinnings of America's DataHub. "We are reaching people that we've never reached before."
Some of those people include the staff and leaders of state and local governments.
"State and local governments provide data to federal agencies that go into our official statistics," says Rivers. "But right now, they can't get that data back to use for their own state or county. Imagine a scenario in which they can securely link to data and use it for state and local government decision-making."
"That means putting data in the public's hands. It's not just people who are data geeks like me that can have access to data. That's what I see America's DataHub growing into."
When more people with different perspectives have access to the best data, they bring ideas and innovation that can lead to original ways to solve problems.
"Ideas plus innovation plus data geeks equals America's DataHub," says Rivers. "So, bring your ideas."