Dear Colleague Letter

National AI Research Resource (NAIRR) Pilot seeks datasets to facilitate AI education and researcher skill development

Seeks datasets for the National Artificial Intelligence Research Resource (NAIRR) Pilot to broaden AI resource access for researchers and educators, particularly newcomers, and to foster a robust AI ecosystem that drives innovation and workforce development.

November 15, 2024

Dear Colleagues:

The National Artificial Intelligence Research Resource (NAIRR) Pilot, led by the U.S. National Science Foundation (NSF) in collaboration with 12 other agencies, is seeking datasets that will enable a broader set of researchers and educators, particularly newcomers to AI, to join the AI research and education community. This Dear Colleague Letter (DCL) seeks to identify datasets that could be used by researchers and educators to facilitate the development of AI skills in the classroom, at a workshop, or in other educational environments to further develop the nation’s AI workforce. The criteria that will be used to choose the datasets are provided below under How to Respond to this DCL. NSF plans to point to these datasets through the NAIRR Pilot portal.

The NAIRR Pilot was launched in January 2024 to demonstrate the value and potential impact of the NAIRR vision as described in the NAIRR Task Force Report. The vision for the NAIRR is to provide the research and education communities with access to critical resources to power AI innovation and discovery while building a trustworthy AI ecosystem. NAIRR Pilot activities include facilitating researcher and educator access to computing, high-quality datasets, software, models and expertise; integrating data, software, platforms, and tools; expanding resource access to researchers and educators new to developing or integrating AI into their foundational, use-inspired, and translational research; and building a NAIRR Pilot ecosystem that will promote learning, research, and innovation. These activities are administered by designated working groups and overseen by the NAIRR Pilot Program Management Office (PPMO).

HOW TO RESPOND TO THIS DCL

To recommend candidate datasets for inclusion on nairrpilot.org, please answer the questions in this Touchpoints form. This link connects to a form, which has 15 questions across six categories: General Information, AI Use Cases, Building a User Community, Metadata and Documentation, User Support and Training and Data Policy. The Touchpoints questionnaire is hosted by the General Services Administration. The full list of questions is included at the bottom of this DCL.

The following criteria will be used to evaluate submissions.

AI USE CASES: Can the dataset be used to provide innovative education and learning opportunities to advance a more trustworthy AI ecosystem in a NAIRR Pilot high-priority area including fundamental AI research, human health, public infrastructure, advanced manufacturing, environment and climate challenges, or other science domains?

USER COMMUNITY: How would the inclusion of this dataset in the NAIRR Pilot help build and diversify the user community and help expand the use of AI by a diverse group of users?

METADATA AND DOCUMENTATION: Is the dataset accessible and are documentation and metadata adequately described and available to the user community? Is there contextual information provided for the dataset, e.g., why, where, how the data was collected?

USER SUPPORT AND TRAINING: Are any user-support programs, educational workshops, or events planned or associated with this dataset?

DATA POLICY: Have all appropriate procedures been followed for the collection and dissemination of the dataset? For example, if the dataset includes any human-subject, controlled, sensitive, or proprietary data, were appropriate IRB procedures followed? Furthermore, are all licenses, user agreements, or restrictions clearly set out for users to review and acknowledge? Has funding for the dataset been described?

Submissions could be reviewed by NSF program staff, other federal employees from NAIRR Pilot collaborating agencies, or external reviewers. NSF will use the information submitted in response to this DCL at its discretion and may not provide comments to a responder's submission. The information provided will be analyzed, may appear in reports, and may be shared publicly on agency websites. Respondents are advised that the government is under no obligation to acknowledge receipt of the information or provide feedback to respondents with respect to any information submitted. No proprietary, classified, confidential, or sensitive information should be included in your response. The government reserves the right to use any non-proprietary technical information in any resultant solicitation(s), policies, or procedures.

If datasets are deemed acceptable for the NAIRR Pilot, nairrpilot.org will provide a link to these datasets to be accessed by users. A standard disclaimer will be noted for users: Neither NSF nor the awardee managing the nairrpilot.org validates or endorses any of the individual datasets or models; they are provided to the community as potential resources that could advance NAIRR Pilot goals.

Submissions led by teams located in Established Program to Stimulate Competitive Research (EPSCoR) jurisdictions and from Minority Serving Institutions are encouraged to respond to this opportunity.

Submission deadline: February 7, 2025

Sincerely,

Susan Marqusee, Assistant Director
Directorate for Biological Sciences (BIO)

Gregory Hager, Assistant Director
Directorate for Computer and Information Science and Engineering (CISE)

James L. Moore III, Assistant Director
Directorate for STEM Education (EDU)

Susan Margulies, Assistant Director
Directorate for Engineering (ENG)

Alexandra R. Isern, Assistant Director
Directorate for Geosciences (GEO)

David Berkowitz, Assistant Director
Directorate for Mathematical and Physical Sciences (MPS)

Kaye Husbands Feeling, Assistant Director
Directorate for Social, Behavioral and Economic Sciences (SBE)

Erwin Gianchandani, Assistant Director
Directorate for Technology, Innovation, and Partnerships (TIP)

Alicia J. Knoedler, Head,
Office of Integrative Activities (OIA)

Questions asked on the Touchpoints form:

GENERAL INFORMATION

What is the name of the dataset? Can you briefly describe the dataset for people unfamiliar with it? If your dataset already has a DOI (digital object identifier), please enter it here.
Please enter the name of the organization submitting the dataset and the name and the institutional e-mail address of one or more contacts.
Please enter the URL for accessing the dataset (if public) or for learning about the dataset. If there is an API, please describe it briefly.

AI USE CASES

What use cases does this data set support? What use cases was this data set designed to support? Please describe some current or possible AI use cases.
Describe how this dataset can be used to facilitate AI education and researcher skill development. For example, how can the dataset be used to broaden access to AI resources in the classroom, educational or other training environment for new as well as experienced AI researchers and/or educators?

USER COMMUNITY

Describe the current community of users of the dataset. [For example, please describe the user community with respect to (i) the user’s institution (e.g., government, academia, industry) and geographic location; and (ii) the degree of sophistication in applications and/or training (e.g., new to AI, user with some knowledge of AI, somewhat advanced user of AI, experienced AI researcher/educator, AI innovator)].
How many current users are there for the dataset? If you don't know, please give your best estimate, and address how you arrived at it in the answer to the next question. [radio buttons: 1-9; 10-99; 100-499; 500-999; 1,000-4,999; 5,000-9,999; 10,000+]
How do you envision the growth of the user community under the NAIRR Pilot program?
Are there training/educational or user-support programs associated with or planned for this dataset? Please indicate what would be made available to NAIRR users.

METADATA AND DOCUMENTATION

Describe any metadata, metadata standards, data/model cards/sheets, and documentation and how they can be accessed. You may also list publications about the dataset or representative publications citing the dataset.
Describe any data quality assessments that have been conducted using this dataset that discuss the utility, objectivity, and integrity of the data.

DATA POLICY

Please let us know in this section if your dataset includes any human-subject, controlled, sensitive, or proprietary data.

What policies and procedures have been followed in the collection of this dataset (e.g., copyright or IRB approval)?
Describe any licenses, user agreements, or restrictions that may apply to the use of the dataset. Are these clearly set out for users to review and acknowledge? If so, please describe where these licenses, agreements, and restrictions can be found.

ADDITIONAL INFORMATION

Please list here all current and past sources of funding (non-governmental or governmental) for this dataset, including the funding entities and, if relevant, grant numbers.
Please use this space to add any additional information.