Developing the 21st century data science workforce

NSF program aims to expand data science education pathways while solving community issues 

By Jason Bates

Timely and effective use of data provides a significant competitive advantage in today's society. Manufacturing, retail, healthcare, agriculture and other sectors are using data-driven insights to respond to changes in markets, improve decision-making and operate more efficiently.

The growth of the internet and connected devices is making it easier for companies, organizations and government agencies of all sizes to access and capitalize on this data for the benefit of all. But many groups are not able to take advantage because they lack the skilled staff and resources needed. There are several challenges in filling the workforce gap between academia and industry. Often, the data collected by organizations are simply not useful, or the data analytics skills needed to use the data are unavailable.

"It's hard to overstate the need for people with data science acumen and some data-specific skills. ... The best talents will go to the best tech-oriented companies, so it's hard for small and mid-size companies to recruit and retain the best and brightest." - Mark Daniel Ward, a professor at Purdue and director of The Data Mine

The U.S. National Science Foundation is helping tackle this problem, supporting programs designed to help students add data science expertise to their skillset while also helping community-based and nonprofit organizations take advantage of the data they possess. The Data Science Corps program, or DSC -- a component of NSF's Harnessing the Data Revolution Big Idea -- works with the academic community to bring students and local organizations together to use available data to solve problems. This will address challenges in the community and transform data science education to develop a 21st-century data-capable and diverse workforce. 

Growing the data science workforce 

One approach to expanding the talent pool is to engage more students outside of traditional data science programs. In 2018, Purdue University established The Data Mine, a university-wide  community that teaches data science to participating undergraduates from all majors.The HDR DSC: National Data Mine Network award to the American Statistical Association will expand this effort using hybrid platforms to train a cohort of 300 students at dozens of partner institutions across the nation. The students will use computing to solve data-driven challenges in every sector of industry, including biomedical engineering, healthcare engineering, image processing, manufacturing, supply chain management and transportation.

"It's hard to overstate the need for people with data science acumen and some data-specific skills, such as someone who has studied agriculture and wants to study data science and go into precision agriculture," said Mark Daniel Ward, a professor at Purdue and director of The Data Mine. "The best talents will go to the best tech-oriented companies, so it's hard for small and mid-size companies to recruit and retain the best and brightest."

The program has partnerships with 50 organizations that want to work with students, and Ward wants to have more than 100 partners when the first cohort of students enter the program in fall 2022. "Students don't have to major in data science to be part of this. We're making it accessible to all students and building a pipeline of data-fluent graduates," he said.

Helping local communities tackle issues 

A key element of the DSC awards is working with local organizations to replace theoretical data science training with real-world engagement. This helps local organizations uncover the useful information in the piles of data they possess and use it for the benefit of the community.

The HDR DSC: Collaborative Research: The Data Science WAV: Experiential Learning with Local Community Organizations award was granted in 2019 during the first set of DSC awards to focus on both parts of this challenge -- the inability of community-based and non-profit organizations to tackle data science problems and the lack of real-world experience for students. 

"This has also provided us with a better sense of what the skills need to look like for data scientists working with a particular community organization," said Nicholas Horton, a professor of statistics and data science at Amherst College. "What I've been the most excited about is that there have been very direct, tangible benefits for organizations and the students we've worked with. This has us thinking about curricular changes. The students are thinking in different ways, and we are thinking about ways we need to continue changing and adapting so that all students have experiences like this."

Working with a variety of organizations across the Pioneer Valley region in Massachusetts provides the students insight into the range of data science applications. It also shows them how those skills can have a positive impact across society, including work around bike share programs and climate and weather data analysis for local organizations.

"Every field is a tech field, and in this newest wave, every field needs data and data analysis," said Valerie Barr, the Jean E. Sammet professor of computer science at Mount Holyoke College. "What a program like this does is very complementary to the education data scientists want, but it's also a great experience for students who are not data science majors. We are getting a lot of applications from students in economics who are now doing computer science minors. They will be able to take that into the workplace, and it makes them more competitive in the market.

The project currently includes several institutions ranging from liberal arts and community colleges to a large public university, including two women's colleges and two HSIs. The DSC-WAV project is expanding data science skills to new groups and providing educational opportunities to more first-generation, low-income students who begin studies at one of the community colleges and then transfer to a four-year institution to complete their work.

"We want to make it very transparent for students to decide what courses to take in community college and then transfer," said Horton. "We know that nationwide there are a tremendous number of high school students taking data science courses. By creating transparent pathways to data science, we can let other students know what they need to do early on." 

Expanding data science education across institutions 

One of the major challenges in data science is that underrepresented communities most in need of data science help are also those historically underrepresented in STEM. Broadening participation to a wider array of academic institutions, from community colleges to minority-serving institutions, will help address that challenge and create a more diverse workforce.

The Collaborative Research: HDR DSC: DS-PATH: Data Science Career Pathways in the Inland Empire award brings together six institutions to advance data science education in the Inland Empire region in California. The partnership among universities and community colleges that are primarily Hispanic Serving Institutions, or HSIs, intends to create multiple opportunities for students in the region to learn data science skills and apply them with organizations in the Inland Empire.

The HDR DSC: AI across the statewide curriculum program award includes Florida A&M University -- one of the most prominent historically Black colleges and universities and among the largest universities producing STEM underrepresented graduates in the U.S. The program intends to develop a curriculum for students from outside traditional computer-science fields to learn artificial intelligence skills that can address critical emerging problems.

The Collaborative Research: HDR DSC: Infusion of Data Science and Computation into Engineering Curricula award will create an ecosystem of engineering students across organizations. Partnering groups include HBCUs; HSIs; institutions that are part of EPSCoR, or the Established Program to Stimulate Competitive Research; local industry and communities of practice; and even K-12 schools and teachers.

About the Author

Jason Bates
Senior Technical Writer/Editor

Jason is a Senior Technical Writer/Editor in the Office of Legislative and Public Affairs, National Science Foundation/FedWriters