EAR Data Management and Sharing Plan Guidance

Overview

The U.S. National Science Foundation Division of Earth Sciences (EAR) is committed to achieving the broadest benefit of its research investments. Adherence to open, inclusive and transparent research practices, including those articulated through the FAIR guiding principles (findable, accessible, interoperable and reusable) and the CARE principles for Indigenous data governance (collective benefit, authority to control, responsibility and ethics), is critical for maximizing the scientific value of data, samples and other research products supported through EAR awards. The 2020 National Academies of Sciences, Engineering, and Medicine report, A Vision for NSF Earth Sciences 2020-2030: Earth in Time, noted that "FAIR data standards will improve the longevity, utility, and impact of EAR-funded data." In 2022, a memorandum published by the White House Office of Science and Technology Policy (OSTP) articulated the importance of ensuring free, immediate, and equitable access to federally funded research. In 2023, NSF published an updated Public Access Plan 2.0, which describes NSF's expected approach to achieving the goals of the OSTP memorandum. EAR believes that projects committed to open sharing of results and conducted in alignment with FAIR and CARE principles will accelerate scientific discovery, broaden data access and ensure reproducibility and replicability of research in the Earth sciences. Deposit of data and associated metadata (including sample-based metadata) in repositories that fulfill FAIR principles, as articulated in the OSTP "Desirable Characteristics of Data Repositories for Federally Funded Research" guidelines, is a straightforward way for Earth scientists to adhere to open science principles.

This document defines data and sample policies for all proposals submitted to and awards managed by EAR programs. These policies supplement NSF-wide requirements in the "Proposal and Award Policies and Procedures Guide" (PAPPG). In the PAPPG, NSF requires that all proposals include a Data Management Plan (DMP) describing how the project will conform to the PAPPG policy on dissemination and sharing of research results. NSF considers the DMP to be an integral part of the proposal, to be considered under intellectual merit and/or broader impacts, as appropriate, and as part of the proposal evaluation process. As such, EAR program directors ask proposal reviewers to carefully evaluate proposal DMPs relative to the guidance set forth in this policy, the PAPPG, relevant program solicitation(s), community-specific standards and open science principles (e.g., FAIR and CARE). During the period of the award, EAR awardees are responsible for adhering to the DMP, and EAR program directors monitor such adherence through annual and final project reports.

EAR requirements for Data Management Plans (DMPs)

This section summarizes key requirements for DMPs as described in the PAPPG and supplemented by this EAR policy. Specific guidance on how to achieve these requirements is provided below in "Data Management Plan (DMP) content for EAR proposals."

EAR requirements:

  1. Proposals must include a document of no more than two pages, titled "Data Management Plan," in the supplementary documentation section of the proposal. In cases of collaborative proposals or proposals involving subawards, the lead principal investigator (PI) submits a single DMP for the entire project. In cases where no data or samples will be produced (for example, in conference proposals), the DMP may simply state that no detailed plan is needed, as long as such statement is clearly explained.
  2. The DMP should demonstrate consistency with open science principles (e.g., FAIR and CARE) and community-specific standards. While variation in DMPs is expected across research communities, each DMP should be appropriate for the data and samples being generated and reflect community best practices. Deviations from open science principles and/or community standards must be justified.
  3. The DMP must address plans for all types of data and samples to be collected and/or generated through the proposal, including roles and responsibilities for managing such data and samples, as well as relevant metadata standards to be followed. EAR defines "data" and "samples" expansively while acknowledging differences across disciplines. Possible types of "data" to be addressed in the DMP include, but are not limited to: observational, experimental, analytical and model outputs; derived and compiled datasets; software and code; educational materials; and any other relevant digital products resulting from the project. Possible types of “samples” to be addressed in the DMP include, but are not limited to: physical samples and collections; drilling cores; specimens; and any other relevant physical, chemical and/or biological materials resulting from the project. For purposes of this policy, sample-derived digital products are considered "data."
  4. All new data resulting from the project must be made publicly accessible within two years after completion of data collection or generation via appropriate long-lived FAIR compliant repositories. Expected timelines for data collection or generation may vary by data type and should align with appropriate disciplinary expectations. All new data collected via continuing observations, large-scale community projects or NSF Rapid Response Research (NSF RAPID) awards must be made accessible as close to the time of initial collection as is practicable. All data in support of peer-reviewed scholarly publications resulting from the project must also be made publicly accessible at or before the time of publication. Exceptions to this policy must be justified (e.g., if an appropriate repository does not exist, or if data access must be restricted). "Data available upon request" is not acceptable.
  5. Metadata describing all new samples resulting from the project must be publicly indexed within two years after sample collection is considered complete, via appropriate long-lived FAIR-aligned repositories. Metadata describing samples collected via continuing observations, large-scale community projects or NSF RAPID awards must be indexed and made accessible as close to the time of collection as is practicable. All sample metadata in support of peer-reviewed scholarly publications must also be publicly indexed at or before the time of publication. Publicly indexed sample metadata should specify provisions for sample access, including the expected period and location of sample preservation, preferably via a repository appropriate for the specific sample type. The samples themselves should also be made publicly accessible within the above timeframes; situations in which samples cannot be made publicly accessible should be explained.

Some programs within the EAR have specific guidelines regarding data and sample acquisition, permitting and repository selection. Please see the relevant program solicitation(s) and consult with the cognizant program director(s) for further information.

DMP content for EAR proposals

To fulfill the requirements described above and to ensure alignment with open science principles, PIs are encouraged to structure their DMPs around the following two sections:

  1. Data and sample types. Describe the types of data and samples expected to result from the proposed work:
    1. List the types of data and samples to be collected and/or generated. The listing of each data/sample type should briefly identify what metadata will be provided and when data/sample preparation will be considered complete. (Definitions of "data" and "samples" are explained above within "EAR requirements.") For proposals providing community-serving infrastructure or research services, the DMP should describe the data/sample types to be managed and what guidance or support will be provided to help users meet their data/sample sharing obligations. EAR recognizes that data and samples may undergo multiple transformations in the research process (including destructive analyses), and disciplinary expectations for assignment of metadata and retention of intermediate data and sample products may vary.
    2. For each data or sample type, identify which personnel and institution(s) will be designated for its management, including contingency plans for the departure of key personnel from the project. For collaborative projects, PI(s) of the award(s) associated with the designated personnel and institution(s) are ultimately responsible for overseeing and reporting on their data and sample management activities.
  2. Data/sample deposit, access and preservation. Describe how each type of data or sample will be deposited, made accessible and preserved:
    1. For each data type listed, identify an appropriate long-lived FAIR-aligned repository for data deposit, the timeframe for public data access, and the expected period of data preservation. For each sample type listed, identify an appropriate long-lived FAIR-aligned repository for indexing sample metadata, the location for sample storage (preferably a repository appropriate for the specific sample type) and the expected period of sample preservation. (Required timeframes for data and sample access are specified above within "EAR Requirements.") Many repositories commit to preserve access to data and samples indefinitely; any deviations from this expectation should be explained. PIs are encouraged to coordinate with designated repositories in advance of planned data/sample submission.
    2. In most cases, it is sufficient for the DMP to identify the repositories to be used and the timeframe for access and preservation for each type of data/sample identified. In these cases, the selected repositories should align with FAIR principles and community-specific standards. Occasionally, appropriate long-lived FAIR-aligned repositories do not exist for certain types of data or samples. In such cases, it may be necessary to adopt alternative approaches to data access and retention, such as via use of a local computer server. In such cases, the DMP should explain how the proposed approach fulfills important attributes for FAIR-aligned repositories, consistent with OSTP guidance, "Desirable Characteristics of Data Repositories for Federally Funded Research." These attributes include but are not limited to the following:
      1. Findability. Data should be findable via standard search tools, such as through the assignment of globally unique persistent identifiers (e.g., digital object identifiers (DOIs) and International Geo Sample Numbers (IGSNs)) and rich metadata that is indexed in a searchable resource.
      2. Accessibility. Data should be publicly accessible to other researchers, at no more than incremental cost, within the specified timeframe. Any data access limitations must be justified. "Data available upon request" is not acceptable.
      3. Interoperability. To ensure interoperability, data should be described via appropriate metadata standards, in alignment with expectations of the associated scientific discipline(s).
      4. Reusability. To facilitate the broadest possible data reuse, data should be assigned clear and accessible usage licenses and metadata descriptors that identify provenance. EAR expects the adoption of unrestrictive open licenses except with specific justification. 

Costs associated with data and sample management

NSF recognizes that data management activities require time and expense, including upfront curation costs that may be charged by data and sample repositories. Expenditures for such activities are allowable and should be documented and budgeted appropriately. See Dear Colleague Letter: Effective Practices for Data (NSF 19-069) for further guidance. 

Award reporting

PIs and co-PIs are responsible for providing updates to NSF within annual and final project reports on data and sample management activities carried out by personnel and institution(s) associated with their awards, as designated in the DMP. PIs and co-PIs are also expected to provide updates to the general public by reporting on data and sample management activities within the Project Outcomes Report and by indexing research products within the NSF Public Access Repository (NSF-PAR). The project outcomes report and NSF-PAR entries can be viewed on the public-facing award page.

Annual and final project reports to NSF should contain the following information:

  • Describe ongoing data and sample management activities, including data/samples in preparation that have not yet been shared, in the "Accomplishments" section of the report. Describe any significant deviations from the proposal DMP in the "Changes/Problems" section of the report.
  • List data and/or samples that have been made available within the prior year within the "Products" section of the report. Such listings, which may be facilitated by indexing associated metadata within the NSF-PAR, should include globally unique persistent identifiers, such as digital object identifiers (DOIs) or international geo sample numbers (IGSNs). 
  • For data or samples that will be made available after submission of the final project report, the PI should include plans for data and sample availability in the "Accomplishments" section of the report. Once the data and/or samples are made available, the PI should index associated metadata in the NSF-PAR and notify the cognizant program director by e-mail. 

Annual and final project reports that do not adequately address data and sample management activities may be returned to PIs to provide the required information.

Implementation of prior DMPs may be considered during evaluation of subsequent proposals. Description of data and sample management for prior awards should be included in the "Results from Prior NSF Support" section of the project description of the proposal. When appropriate, this section should include evidence that data, samples, and other products have been made accessible in appropriate repositories. All products that are specifically listed in the “Results from Prior NSF Support” section must be referenced in the references cited section of the proposal with globally unique persistent identifiers (e.g., DOIs or IGSNs).

Resources

To facilitate adherence to the EAR data and sample policy and open science practices, EAR maintains this list of resources for EAR proposers and awardees. EAR recognizes that there is a large ecosystem of resources to support the management of data, samples and other research products. This list is not exhaustive, nor is it meant to endorse particular resources. EAR will periodically update this list.

Data-oriented resources: 

Education-oriented resources:

Sample/collections-oriented resources:

Software/model-oriented resources:

For preparation of data management plans (DMPs):

For data deposit and access:

For open-source software development: 


Past EAR DMP guidance and resources

The Division of Earth Sciences is committed to the establishment, maintenance, validation, description, and distribution of high-quality data sets. Per the NSF policy on Dissemination and Sharing of Research Results, as stated in the Proposal & Award Policies & Procedures Guide (PAPPG), Principal Investigators (PIs) are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the data, samples, physical collections, and other supporting materials created or gathered in the course of work under NSF grants. 

The Division of Earth Sciences requires that full data sets, derived data products (e.g., model results, output, and workflows), software, and physical collections must be made publicly accessible within two (2) years of final collection. This two-year period may be extended under exceptional circumstances, but only by written agreement between the Principal Investigator and the National Science Foundation. For continuing observations or for long-term projects (deployments lasting > 36 months) where telemetry solutions allow, data are to be made public in real to near-real time. Provision must also be made for the archiving of physical samples collected as part of a project, as feasible. A description of how this will be accomplished and how long the samples will be curated should be included as part of the Data Management Plan, described below. 

All NSF proposals must include a document of no more than two pages uploaded under "Data Management Plan" in the supplementary documentation section of the proposal. This supplementary document should describe what data/samples will be collected, what analyses will be done, and how the project will provide open and rapid access to samples, data, derived data products (e.g., models and model output), and other information on the project during and after the project's completion. Some types of data may be considered “final” at different stages of processing in different fields. Thus, PIs should define, in their data management plans, in what state they would consider their data to be final and ready for public access. In addition, the Data Management Plan also should specifically discuss how the investigators will achieve the specific EAR data archiving and reporting requirements described in this document. If the project is not expected to generate new data, samples or derived data products, the Data Management Plan should include a statement that no detailed plan is needed, accompanied by a clear justification. See the PAPPG for additional information. Some programs within the Division of Earth Sciences have specific guidelines regarding data and sample acquisition, permitting and repository. Please see the relevant program solicitation and consult with the cognizant Program Director for further information. 

Preferred data and physical collection archives and centers can be found in the Appendix or through contact with the cognizant Program Director of the program. Where no repository or archive exists for collected data and samples, the PI is required to identify a preservation plan in the Data Management Plan that complies with the general philosophy of sharing research products and data within two years of collection as described above. This could include a museum- or university-hosted repository if that repository is intended for long-term curation. Any limit on access to data, samples, or other information beyond the two-year moratorium period must be based on compelling justification, documented in the Data Management Plan of the proposal, or approved by the cognizant Program Director. 

PIs are required to provide updates on the status of metadata and data archiving in Annual Project Reports. Compliance with the project Data Management Plan must be documented in the Final Project Report. Identifiers for archived metadata and data, such as Digital Object Identifiers (DOIs) or persistent Uniform Resource Locators (URLs), must be included in these reports in the section entitled "Products-Websites." Where the Final Report is due before the required date of sample or data submission, the PI must report plans for final data/sample submission. The PI should notify the cognizant Program Director by e-mail after final data and/or sample submission has occurred, even if this is after the end date of the award. 

Recommended services for finding, accessing, sharing, and archiving EAR-funded research products are listed below. When making plans for data sharing and archiving, check in advance with managers of designated data service entities to ensure their ability to accommodate your expected research products. Data, samples, and models should be submitted according to formats specified by each entity. For general guidance, see the EAR Data and Sample Policy. If you have further questions regarding the home(s) for your research products, contact the Program Director for your program. 

Resources below are classified by type, as follows. These are only suggested classifications. 

  • National Data Centers – Large-scale data facilities managed by other federal agencies (e.g., NASA, NOAA) 
  • Data Resources – Repositories or other resources for depositing, sharing, and/or archiving datasets 
  • Sample Resources – Repositories or other resources for documenting, depositing, sharing, and/or storing specimens and physical samples 
  • Software Resources – Repositories or other resources for depositing, sharing, and/or archiving software 
  • Data Portals – Portals to provide search capabilities across multiple data repositories 
  • Multi-purpose Resources – Resources to combine data, samples, software, and other functions 
  • Community Activities – Activities to organize and support researchers in sharing research products 

Note: This list is meant to help in data management planning, but PIs are free to work with other data service entities as suitable for their projects. URLs and contact information are current as of publication of this document in March, 2018. If you notice a change that should be made, please contact your Program Director. 

AMCSD 
American Mineralogist Crystal Structure Database 
An interface to a crystal structure database that includes every structure published in the American Mineralogist, The Canadian Mineralogist, European Journal of Mineralogy and Physics and Chemistry of Minerals, as well as selected datasets from other journals. 
Data Resources 
rruff.geo.arizona.edu/AMS/amcsd.php 

Arctic Data Center 
Discoverable data products from NSF-funded science in the Arctic, including data, metadata, documents, software, and provenance. Includes submission and data management tools. 
Data Resources 
https://arcticdata.io 

Arctos 
Multi-purpose Resources 
A data management system that provides fundamental research infrastructure for biodiversity data, and is intended for curators, collection managers, investigators, educators, and anyone interested in natural and cultural history. https://arctosdb.org/about/  

BCO-DMO 
The Biological and Chemical Oceanography Data Management Office 
Biological, chemical and physical oceanography measurements and experimental and model results, including CTD, biological abundance, meteorological, nutrient, pH, carbonate, PAR, sea surface temperature, heat and momentum flux, sediment composition, trace metals, primary production, and pigment concentration measurements, and with images and movies. 
Data Resources 
http://www.bco-dmo.org/ 

CEOAS 
Oregon State University Marine Geology Repository 
A curation facility for marine rock and sediment samples. Our mission is to facilitate research, education, and the advancement of scientific knowledge through access and use of our diverse collection of rock, lake, and marine sediment samples. 
Sample Resources 
osu-mgr.org 

CIG 
Computational Infrastructure for Geodynamics 
A community-driven organization that advances Earth science by developing and disseminating community modeling software, and hosting for other codes for geophysics and related fields. 
Software Resources 
geodynamics.org  

CSDCO 
Continental Scientific Drilling Coordination Office 
Archives of samples, data, publications, and reference collections that are critical community infrastructure components for continental drilling and coring. Related to LacCore, below. 
Multi-purpose Resources 
https://csdco.umn.edu/ 

LacCore 
National Lacustrine Core Facility 
Associated with CSDCO, it provides infrastructure for scientists utilizing core samples from Earth’s continents. Includes pollen and seed reference collections for use by any researcher to advance the interpretation of sediment core samples. 
Sample Resources 
http://lrc.geo.umn.edu/laccore/ 

CSDMS 
Community Surface Dynamics Modeling System 
Focused on the modeling of earth surface processes by developing, supporting, and disseminating integrated software modules that predict the movement of fluids, and the flux (production, erosion, transport, and deposition) of sediment and solutes in landscapes and their sedimentary basins. CSDMS maintains collections of data products, modeling tools, and standard names for data/model interoperability. 
Multi-purpose Resources 
http://csdms.colorado.edu/ 

CINERGI 
Community Inventory of EarthCube Resources for Geosciences Interoperability 
An inventory of available information across geosciences domains with resources that have consistent and easy-to-interpret descriptions, traceable origins, and documentation that is as complete as possible. 
Data Portals 
http://cinergi.sdsc.edu/geoportal/ 

CUAHSI 
Consortium of Universities for the Advancement of Hydrologic Science, Inc. 
Support and resources for the water science community and other critical-zone science groups for accessing data, publishing data, implementing data-driven education, developing data tools, and collaborating with each other around data and models. CUAHSI data services include HydroClient HydroShare and WaterML. 
Multi-purpose Resources 
www.cuahsi.org 

CZEN 
Critical Zone Exploration Network 
A community of people and a network of field sites investigating processes within the Critical Zone, defined as the Earth’s outer layer from vegetation canopy to the soil and groundwater that sustains human life.  
Multi-purpose Resources 
czen.org 

DataOne 
Data Observation Network for Earth 
A distributed framework and sustainable cyberinfrastructure that meets the needs of science and society for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data. 
Data Portals 
www.dataone.org 

EarthCube 
A dynamic, Sytem of Systems (SoS) infrastructure and data tools to discover and access all forms of geoscience data and resources, using advanced collaboration, technological, and computational capabilities. 
Community Activities 
www.earthcube.org 

EarthScope 
An NSF program that acquires, delivers, and archives data, develops data analysis protocols and products, provides engineering services for field instrument deployment, and organizes communities to study the structure and evolution of the North American continent and the processes that cause earthquakes and volcanic eruptions. 
Multi-purpose Resources 
www.earthscope.org 

GCMD 
Global Change Master Directory 
A directory resource that contributes to scientific research by providing stewardship of metadata and direct access to Earth science data, metadata, and services. 
National Data Centers 
https://gcmd.nasa.gov/ 

GenBank 
A genetic sequence database containing an annotated collection of all publicly available DNA sequences. 
Data Resources 
https://www.ncbi.nlm.nih.gov/genbank/ 

GeoLink 
Collection of standard protocols, formats, and vocabularies, often characterized as the Semantic Web. Includes content from field expeditions, laboratory analyses, journal publications, conference presentations, theses/reports, and funding awards that span from marine geology to marine ecosystems and biogeochemistry to paleoclimatology. Community Activities 
www.geolink.org 

Gigapan 
A repository for over 50,000 gigapixel panoramic images from around the world that can be either uploaded or constructed using their online portal. 
Data Resources 
http://www.gigapan.com 

Github 
A platform for version control and collaboration on open-source software development. Software Resources 
www.github.com 

GRSciColl 
Global Registry of Scientific Collections 
A community-curated, comprehensive clearinghouse of information about object-based scientific collections using institution and collection codes for registry purposes. 
Data Portals 
http://grscicoll.org/ 

ICDP DIS 
International Continental Scientific Drilling Program Drilling Information System 
Documentation and administration of (1) basic, initial, and primary data, (2) initial measurements and reports, and (3) sample requests, sample curation and sample distribution as it relates to continental scientific drilling.  
Multi-purpose Resources 
www.icdp-online.org/support/service/data-sample- management/drilling-information-system/ (no longer active)

iDigBio 
Integrated Digitized Biocollections 
An open-access portal focused on wide-spread electronic data sharing of biological specimens. 
Data Portals 
https://www.idigbio.org/ 

IEDA 
Interdisciplinary Earth Data Alliance 
IEDA is a community-based facility that serves to support, sustain, and advance the geosciences by providing data services for observational Geoscience data from the Ocean, Earth, and Polar Sciences. 
Multi-purpose Resources 
http://www.iedadata.org/ 

Community-specific resources within IEDA: 

EarthChem 
The broad portal for geochemical data of the solid earth with access to complete data from multiple data systems. 
Data Resources 
www.earthchem.org 

Geochron 
Part of EarthChem focused on high-precision geochronology and quantitative chronostratigraphy. 
Data Resources 
http://www.geochron.org/ 

GeoPRISMS Data Portal 
Geodynamic Processes at Rifting and Subducting Margins Data Portal 
Provides access to program information and data collected through the GeoPRISMS program. 
Multi-purpose Resources 
www.marine-geo.org/portals/geoprisms 

LEPR 
Library of Experimental Phase Relations 
A database of results of published experimental studies involving liquid-solid phase equilibria relevant to natural magmatic systems. 
Data Resources 
https://lepr.earthchem.org/access_user/login.php 

PetDB 
Chemical, isotopic, and mineralogical data for rocks, minerals, and melt inclusions, focusing on igneous and metamorphic rocks from the ocean floor (specifically mid-ocean ridge basalts and abyssal peridotites) and xenolith samples from the Earth's mantle and lower crust. 
Data Resources 
https://www.earthchem.org/ecl/

SESAR 
System for Earth Sample Registration 
Operates a registry that distributes the International Geo Sample Number IGSN. SESAR catalogs and preserves sample metadata profiles, and provides access to the sample catalog via the Global Sample Search. 
Data Resources 
http://www.geosamples.org/ 

INTERMAGNET 
International Real-time Magnetic Observatory Network 
A global network of data exchange between geomagnetic observatories. Its goals are to (1) establish and maintain digital geo- magnetic observatories in remote areas; (2) standardize geomagnetic measuring and recording equipment and (3) establish a real-time world-wide data exchange using existing meteorological satellites. 
Multi-purpose Resources 
https://intermagnet.org/

IRIS 
Incorporated Research Institutions for Seismology 
The consortium of U.S. research programs in seismology provides management of, and access to, observed and derived data for the global earth science community for timeseries, earthquake and seismic events, as well as ground motion, atmospheric, infrasonic, hydrological, and hydroacoustic data. Community derived data services and tools are also available. 
Multi-purpose Resources 
www.iris.edu/hq/ 

KBase 
U.S. DOE Systems Biology Knowledgebase 
For systems biology: predicting and designing biological function. An open-source large-scale bioinformatics system that enables users to upload data, analyze it with collaborator and public data (as needed) build increasingly realistic models, and share and publish their workflows and conclusions. 
Multi-purpose Resources 
kbase.us 

MagIC 
Magnetics Information Consortium 
Promoting information technology infrastructures for the international paleomagnetic, geomagnetic and rock magnetic community. 
Multi-purpose Resources 
https://www2.earthref.org/MagIC 

MATLAB File Exchange 
Matrix Laboratory File Exchange 
Allows users to find or share custom applications, classes, code examples, drivers, functions, Simulink models, scripts, and videos. 
Software Resources 
www.mathworks.com/matlabcentral/fileexchange/ 

Morphobank 
A web application with tools and archives for evolutionary research, specifically systematics (the science of determining the evolutionary relationships among species).  Data Resources 
www.morphobank.org  

NCBI 
National Center for Biotechnology Information 
NCBI collects submissions of data for the world's largest public repository of biological and scientific information. 
Data Resources 
https://www.ncbi.nlm.nih.gov/home/submit/ 

Neotoma 
Neotoma Paleoecology Database 
A hub whose structure facilitates interdisciplinary, multiproxy analyses and common tool development. Data currently include North American Pollen (NAPD) and fossil mammals (FAUNMAP). Data are derived from sites from the last 5 million years. 
Data Resources 
https://www.neotomadb.org/ 

NMNH Biorepository 
Smithsonian National Museum of Natural History Biorepository 
The NMNH Biorepository is a large museum-based natural history biorepository containing free, permanent, archival storage of DNA sequences, tissues and phenotype vouchers of genomic research and collections. 
Sample Resources 
http://naturalhistory.si.edu/rc/biorepository/ 

NOAA's National Centers have been consolidated into the National Centers for Environmental Information, or NCEI. 
These include: 

NCEI 
The National Centers for Environmental Information 
NOAA's National Centers for Environmental Information (NCEI) is responsible for preserving, monitoring, assessing, and providing public access to the Nation's treasure of climate and historical weather data and information. 
National Data Centers 
http://www.ncei.noaa.gov/ 

NGDC 
National Geophysical Data Center  
Geophysical, geological and geochemical data: bathymetry, magnetics, gravity, seismic and other quantitative geophysical data; geological data including station locations, collection/storage locations, preliminary descriptions of seafloor samples recovered, and all descriptions and analytical data, including geochemistry, derived from sediment and rock samples. 
National Data Centers 
https://www.ngdc.noaa.gov/ 

NODC 
National Oceanographic Data Center 
NODC has implemented numerous interoperable data technologies to enhance the discovery, understanding, and use of the vast quantities of oceanographic data in the NODC archives.  
National Data Centers 
https://www.nodc.noaa.gov/access/services.html 

NCDC 
National Climatic Data Center 
NCDC provides public access to climate and historical weather data and information, including the World Data System for paleoclimatology, which houses a wide range of solar, geophysical, environmental, and human dimensions data. 
National Data Centers 
https://www.ncdc.noaa.gov/climate-information 

NSIDC 
National Snow & Ice Data Center 
Snow pack and other glaciological data. 
National Data Centers 
http://nsidc.org/ 

ORNL DAAC 
Oak Ridge National Laboratory Distributed Active Archive Center 
An archive of data produced by NASA's Terrestrial Ecology Program that is relevant to understanding the dynamics and processes of the biological, geological, and chemical components of Earth's environment.
National Data Centers 
http://daac.ornl.gov/archival_contact_form.html  

OT 
Open Topography 
A LiDAR database with a primary emphasis on earth science related, research-grade, topography and bathymetry data. 
Data Resources 
http://opentopography.org/ 

PaleoBioDB 
Paleobiology Database 
A database created to provide global, collection-based occurrence and taxonomic data for organisms of all geological ages, as well data services to allow easy access to data for independent development of analytical tools, visualization software, and applications of all types. 
Data Resources 
https://paleobiodb.org/classic 

PANGAEA 
World Data Center PANGAEA 
The World Data Center PANGAEA is a Member of the ICSU World Data System, and is an Open Access library aimed at archiving, publishing and distributing georeferenced data from Earth System research. 
Data Resources 
https://pangaea.de 

RRUFF 
The RRUFF™ Project is a database that contains a complete set of high quality spectral data, including infrared, Raman, and X-ray diffraction, from well characterized minerals.
Data Resources 
http://rruff.info/ 

SEAD 
Sustainability Education & Economic Development 
SEAD offers data tools that allow researchers to more easily manage, interpret, share, and publish scientific data to institutional partner repositories. Projects can create collaborative data sharing workspaces with customized metadata schema. 
Data Resources 
Sustainability Education & Economic Development 
https://sead2.ncsa.illinois.edu/  

SEED 
Sustainability Education & Economic Development 
Supporting community colleges in education through submissions of classroom resources that fall under the following categories: solar, wind, alternative fuels, geothermal, green building, energy efficiency, sustainable agriculture, food & land, transportation & fuel, general clean tech, sustainability education, and all other sectors. 
Community Activities
Sustainability Education & Economic Development 
http://www.theseedcenter.org/ 

SEN 
Sediment Experimentalists Network  
Focused on Earth-surface research. Includes (1) Experimental Collaboratories (SEN-EC) to facilitate collaborative multi-institution experiments, (2) Education and Data Standards (SEN-ED) initiative to develop and spread best practices for performing and documenting research work and (3) a Knowledge Base (SEN-KB) to share experimental methods and datasets. 
Community Activities 
http://sedexp.net/ 

SERC 
Smithsonian Environmental Research Center 
Smithsonian researchers and citizen scientists collect data that spans both space and time. SERC has long-term datasets that track decades of environmental change, as well as plant and animal databases that cover the U.S. and beyond. 
Data Portals 
https://serc.si.edu/environmental-data 

Sourceforge 
Find, create, and publish open source software for free.
Software Resources 
https://sourceforge.net/ 

Symbiota 
An open source software framework that strives to integrate biological community knowledge and data to synthesize a network of databases and tools that will aid in increasing our overall environmental comprehension. 
Multi-purpose Resources 
http://symbiota.org/docs/ 

TreeBASE 
University Navigation Satellite Timing and Ranging Consortium 
A repository of phylogenetic information such as trees of species, trees of populations, and trees of genes that represent all biotic taxa. 
Data Resources 
https://treebase.org/ 

USGS ScienceBase 
A collaborative scientific database for products of research associated with the United States Geological Survey (USGS). 
National Data Centers 
https://www.sciencebase.gov/catalog/ 

UNAVCO 
University Navigation Satellite Timing and Ranging Consortium 
A non-profit university-governed consortium that facilitates geoscience research and education using geodesy. The repository database accepts GPS/GNSS and Imaging (SAR and TLS) data. 
Multi-purpose Resources 
http://www.unavco.org/ 

VertNet 
Allows discovering, capturing, and publishing of biodiversity data using the collaboration between hundreds of biocollections.
Multi-purpose Resources 
http://vertnet.org/  

Vhub 
A free online resource for collaboration in volcanology research and risk mitigation. VHub provides easy mechanisms for sharing tools to model volcanic processes and analyze volcano data, to share resources such as teaching materials and workshops, and to communicate with other members of the volcanology community. 
Multi-purpose Resources 
https://vhub.org/  

WHOAS 
Woods Hole Open Access Server 
Institutional repository for WHOI community. 
Data Resources 
http://darchive.mblwhoilibrary.org/ 

The Division of Earth Sciences is committed to the establishment, maintenance, validation, description, and distribution of high-­‐‑quality, long-­‐‑term data sets. Per the NSF policy on Disseminiation and Sharing of Research Results, as stated in the Proposal & Award Policies & Procedures Guide (PAPPG), Principal Investigators (PIs) are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the data, samples, physical collections, and other supporting materials created or gathered in the course of work under NSF grants. 

The Division of Earth Sciences requires that full data sets, derived data products (e.g., model results, output, and workflows), software, and physical collections must be made publicly accessible within two (2) years of final collection. This two-­‐‑year period may be extended under exceptional circumstances, but only by written agreement between the Principal Investigator and the National Science Foundation. For continuing observations or for long-­‐‑term projects (deployments lasting > 36 months) where telemetry solutions allow, data are to be made public in real to near-­‐‑real time. Provision must also be made for the archiving of physical samples collected as part of a project. A description of how this will be accomplished and how long the samples will be curated should be included as part of the Data Management Plan, described below. 

All NSF proposals must include a document of no more than two pages uploaded under "ʺData Management Plan"ʺ in the supplementary documentation section of the proposal. This supplementary document should describe what data/samples will be collected, what analyses will be done, and how the project will provide open and rapid access to samples, data, derived data products (e.g., models and model output), and other information on the project during and after the project'ʹs completion. Some types of data may be considered “final” at different stages of processing in different fields. Thus, PIs should define, in their data management plans, in what state they would consider their data to be final and ready for public access. In addition, the Data Management Plan also should specifically discuss how the investigators will achieve the specific EAR data archiving and reporting requirements described in this document. If the project is not expected to generate new data, samples or derived data products, the Data Management Plan should include a statement that no detailed plan is needed, accompanied by a clear justification. See the PAPPG for additional information. Some programs within the Division of Earth Sciences have specific guidelines regarding data and sample acquisition, permitting and repository. Please see the relevant program solicitation and consult with the cognizant Program Director for further information. 

Preferred data and physical collection archives and centers can be found in the Apendix or through contact with the cognizant Program Director of the program. Where no repository or archive exists for collected data and samples, the PI is required to identify a preservation plan in the Data Management Plan that complies with the general philosophy of sharing research products and data within two years of collection as described above. This could include a museum-­‐‑ or university-­‐‑hosted repository if that repository is intended to be curated long-­‐‑term (as an example: The University of Michigan’s Deep Blue data archive). Any limit on access to data, samples, or other information beyond the two-­‐‑year moratorium period must be based on compelling justification, documented in the Data Management Plan of the proposal, or approved by the cognizant Program Director. 

PIs are required to provide updates on the status of metadata and data archival in Annual Project Reports. Compliance with the project Data Management Plan must be documented in the Final Project Report. If not deposited in an EAR-­‐‑identified federally-­‐funded repository, URL'ʹs for archived metadata and data must be included in these reports in the section entitled "ʺProducts-­‐‑Websites."ʺ Where the Final Report is due before the required date of sample or data submission, the PI must report plans for final data/sample submission. The PI should notify the cognizant Program Director by e-­‐‑mail after final data and/or sample submission has occurred, even if this is after the end date of the award. 

Recommended repositories for EAR-funded research products are listed below. Data, samples, and models should be submitted according to formats designated by each entity. For general guidance, see the EAR Data and Sample Policy, and if you have further questions regarding the home for your products, contact the Program Director for your program.

Note: URLs and contact information are current as of publication of this document in June, 2017.

National Data Centers

NSIDC
National Snow & Ice Data Center
http://nsidc.org/
Snow pack and other glaciological data.

NOAA's National Centers have been consolidated into the National Centers for Environmental Information, or NCEI. This includes:

NCDC
The National Climatic Data Center
https://www.ncdc.noaa.gov/
NOAA's National Centers for Environmental Information (NCEI) is responsible for preserving, monitoring, assessing, and providing public access to the Nation's treasure of climate and historical weather data and information.

NGDC
National Geophysical Data Center
http://www.ngdc.noaa.gov/ngdc.html
Geophysical, geological and geochemical data: bathymetry, magnetics, gravity, seismic and other quantitative geophysical data; geological data including station locations, collection/storage locations, preliminary descriptions of seafloor samples recovered, and all descriptions and analytical data, including geochemistry, derived from sediment and rock samples.

NODC
National Oceanographic Data Center
https://www.nodc.noaa.gov/access/services.html
NODC has implemented numerous interoperable data technologies to enhance the discovery, understanding, and use of the vast quantities of oceanographic data in the NODC archives.

 

Other Database and Repository Resources

ACADIS
Advanced Cooperative Arctic Data & Information Service
www.aoncadis.org
Discoverable data products from NSF-funded science in the Arctic, including data, metadata, documents, software, and provenance. Includes submission and data management tools.

AMCSD
American Mineralogist Crystal Structure Database
rruff.geo.arizona.edu/AMS/amcsd.php
An interface to a crystal structure database that includes every structure published in the American Mineralogist, The Canadian Mineralogist, European Journal of Mineralogy and Physics and Chemistry of Minerals, as well as selected datasets from other journals.

Arctos 
https://arctosdb.org/about/ 
A data management system that provides fundamental research infrastructure for biodiversity data, and is intended for curators, collection managers, investigators, educators, and anyone interested in natural and cultural history.

BCO-DMO 
The Biological and Chemical Oceanography Data Management Office 
http://www.bco-dmo.org/ 
Biological, chemical and physical oceanography measurements and experimental and model results, including CTD, biological abundance, meteorological, nutrient, pH, carbonate, PAR, sea surface temperature, heat and momentum flux, sediment composition, trace metals, primary production, and pigment concentration measurements, and with images and movies.

CEOAS 
Oregon State University Marine Geology Repository 
osu-mgr.org 
A curation facility for marine rock and sediment samples. Our mission is to facilitate research, education, and the advancement of scientific knowledge through access and use of our diverse collection of rock, lake, and marine sediment samples.

CIG 
Computational Infrastructure for Geodynamics 
geodynamics.org 
A community-driven organization that advances Earth science by developing and disseminating software for geophysics and related fields.

CSDCO 
Continental Scientific Drilling Coordination Office 
https://csdco.umn.edu/ 
Archives of samples, data, publications, and reference collections that are critical community infrastructure components for continental drilling and coring. Related to LacCore, below.

LacCore 
National Lacustrine Core Facility 
http://lrc.geo.umn.edu/laccore/ 
Associated with CSDCO, it provides infrastructure for scientists utilizing core samples from Earth’s continents. Includes pollen and seed reference collections for use by any researcher to advance the interpretation of sediment core samples.

CSDMS 
Community Surface Dynamics Modeling System 
http://csdms.colorado.edu/ 
Focused on the modeling of earth surface processes by developing, supporting, and disseminating integrated software modules that predict the movement of fluids, and the flux (production, erosion, transport, and deposition) of sediment and solutes in landscapes and their sedimentary basins.

CINERGI 
Community Inventory of EarthCube Resources for Geosciences Interoperability 
www.earthcube.org/group/cinergi 
An inventory of available information across geosciences domains with resources that have consistent and easy-to-interpret descriptions, traceable origins, and documentation that is as complete as possible.

CUAHSI 
Consortium of Universities for the Advancement of Hydrologic Science, Inc. 
www.cuahsi.org 
Support and resources for the water science community and other critical-zone science groups for accessing data, publishing data, implementing data-driven education, developing data tools, and collaborating with each other around data and models.

CZEN 
Critical Zone Exploration Network 
czen.org 
A community of people and a network of field sites investigating processes within the Critical Zone, defined as the Earth’s outer layer from vegetation canopy to the soil and groundwater that sustains human life.

DataOne 
Data Observation Network for Earth 
www.dataone.org 
A distributed framework and sustainable cyberinfrastructure that meets the needs of science and society for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data.

EarthCube 
www.earthcube.org 
A dynamic, Sytem of Systems (SoS) infrastructure and data tools to collect, access, analyze, share, and visualize all forms of geoscience data and resources, using advanced collaboration, technological, and computational capabilities.

EarthScope 
www.earthscope.org 
An NSF program that acquires, delivers, and archives data, develops data analysis protocols and products, provides engineering services for field instrument deployment, and organizes community to study the structure and evolution of the North American continent and the processes that cause earthquakes and volcanic eruptions.

GCMD 
Global Change Master Directory 
https://data.nasa.gov/Earth-Science/Global-Change-Master-Directory-GCMD-/gt6i-nuv6/about_data
A directory resource that contributes to scientific research by providing stewardship of metadata and direct access to Earth science data, metadata, and services.

GenBank 
https://www.ncbi.nlm.nih.gov/genbank/ 
A genetic sequence database containing an annotated collection of all publicly available DNA sequences.

GeoLink 
www.geolink.org 
Collection of standard protocols, formats, and vocabularies, often characterized as the Semantic Web. Includes content from field expeditions, laboratory analyses, journal publications, conference presentations, theses/reports, and funding awards that span from marine geology to marine ecosystems and biogeochemistry to paleoclimatology.

GeoPRISMS Data Portal 
Geodynamic Processes at Rifting and Subducting Margins Data Portal 
www.marine-geo.org/portals/geoprisms 
Provides access to program information and data collected through the GeoPRISMS program.

Gigapan 
gigapan.com (no longer active)
A repository for over 50,000 gigapixel panoramic images from around the world that can be either uploaded or made using their online portal

Github 
www.github.com 
A repository where versions of a code can be stored allowing it to be a host of open-source software projects.

GRSciColl 
Global Registry of Scientific Collections 
http://grscicoll.org/ 
A community-curated, comprehensive clearinghouse of information about object-based scientific collections using institution and collection codes for registry purposes.

iDigBio 
Integrated Digitized Biocollections 
https://www.idigbio.org/ 
An open-access portal focused on wide-spread electronic data sharing of biological specimens.

ICDP DIS 
International Continental Scientific Drilling Program Drilling Information System 
www.icdp-online.org/support/service/data-sample- management/drilling-information-system/ 
Documentation and administration of (1) basic, initial, and primary data, (2) initial measurements and reports, and (3) sample requests, sample curation and sample distribution as it relates to continental scientific drilling.

IEDA 
Interdisciplinary Earth Data Alliance 
http://www.iedadata.org/ 
IEDA is a community-based facility that serves to support, sustain, and advance the geosciences by providing data services for observational Geoscience data from the Ocean, Earth, and Polar Sciences.

community-specific resources within IEDA: 

EarthChem: The broad portal for geochemical data of the solid earth with access to complete data from multiple data systems. 
www.earthchem.org

PetDB : chemical, isotopic, and mineralogical data for rocks, minerals, and melt inclusions, focusing on igneous and metamorphic rocks from the ocean floor (specifically mid-ocean ridge basalts and abyssal peridotites) and xenolith samples from the Earth's mantle and lower crust. 
http://www.earthchem.org/petdb

Geohron: part of EarthChem focused on high-precision geochronology and quantitative chronostratigraphy 
http://www.geochron.org/

SESAR: System for Earth Sample Registration 
Operates a registry that distributes the International Geo Sample Number IGSN. SESAR catalogs and preserves sample metadata profiles, and provides access to the sample catalog via the Global Sample Search. 
http://www.geosamples.org/

LEPR: Library of Experimental Phase Relations 
A database of results of published experimental studies involving liquid-solid phase equilibria relevant to natural magmatic systems. 
http://lepr.ofm-research.org/YUI/access_user/login.php

INTERMAGNET 
International Real-time Magnetic Observatory Network 
http://www.intermagnet.org/index-eng.php 
A global network of data exchange between geomagnetic observatories. Its goals are to (1) establish and maintain digital geo- magnetic observatories in remote areas; (2) standardize geomagnetic measuring and recording equipment and (3) establish a real-time world-wide data exchange using existing meteorological satellites.

IRIS 
Incorporated Research Institutions for Seismology 
www.iris.edu/hq/ 
Consortium of U.S. research programs in seismology to exchange seismic and other geophysical data.

KBase 
U.S. DOE Systems Biology Knowledgebase 
kbase.us 
For systems biology: predicting and designing biological function. An open-source large-scale bioinformatics system that enables users to upload data, analyze it with collaborator and public data (as needed) build increasingly realistic models, and share and publish their workflows and conclusions.

MagIC 
Magnetics Information Consortium 
https://www2.earthref.org/MagIC 
Promoting information technology infrastructures for the international paleomagnetic, geomagnetic and rock magnetic community.

MATLAB File Exchange 
Matrix Laboratory File Exchange
www.mathworks.com/matlabcentral/fileexchange/ 
Allows users to find or share custom applications, classes, code examples, drivers, functions, Simulink models, scripts, and videos.

MetPetDB 
http://metpetdb.rpi.edu/metpetweb/#home 
A database for metamorphic petrology

Morphobank 
www.morphobank.org 
A web application with tools and archives for evolutionary research, specifically systematics (the science of determining the evolutionary relationships among species).

NCBI 
National Center for Biotechnology Information 
https://www.ncbi.nlm.nih.gov/home/submit/ 
NCBI collects submissions of data for the world's largest public repository of biological and scientific information.

Neotoma 
Neotoma Paleoecology Database 
https://www.neotomadb.org/ 
A hub whose structure facilitates interdisciplinary, multiproxy analyses and common tool development. Data currently include North American Pollen (NAPD) and fossil mammals (FAUNMAP). Data are derived from sites from the last 5 million years.

NMNH Biorepository 
Smithsonian National Museum of Natural History Biorepository 
http://naturalhistory.si.edu/rc/biorepository/ 
The NMNH Biorepository is a large museum-based natural history biorepository containing free, permanent, archival storage of DNA sequences, tissues and phenotype vouchers of genomic research and collections.

OT 
Open Topography 
http://opentopography.org/ 
A LiDAR database with a primary emphasis is on earth science related, research-grade, topography and bathymetry data.

ORNL DAAC 
Oak Ridge National Laboratory Distributed Active Archive Center 
http://daac.ornl.gov/archival_contact_form.html 
An archive of data produced by NASA's Terrestrial Ecology Program that is relevant to understanding the dynamics and processes of the biological, geological, and chemical components of Earth's environment.

PaleoBioDB 
Paleobiology Database 
https://paleobiodb.org/classic 
A databased created to provide global, collection-based occurrence and taxonomic data for organisms of all geological ages, as well data services to allow easy access to data for independent development of analytical tools, visualization software, and applications of all types.

RRUFF 
http://rruff.info/ 
The RRUFF™ Project is a database that contains a complete set of high quality spectral data, including infrared, Raman, and X-ray diffraction, from well characterized minerals.

SEAD 
Sustainable Environment Actionable Data 
http://sead-data.net/connect-your-repository-to-sead (no longer active)
SEAD offers data tools that allow researchers to more easily manage, interpret, share, and publish scientific data to institutional partner repositories.

SEED 
Sustainability Education & Economic Development 
http://www.theseedcenter.org/ 
Supporting community colleges in education through submissions of classroom resources that fall under the following categories: solar, wind, alternative fuels, geothermal, green building, energy efficiency, sustainable agriculture, food & land, transportation & fuel, general clean tech, sustainability education, and all other sectors.

SEN 
Sediment Experimentalists Network 
http://sedexp.net/ 
Focused on Earth-surface research. Includes (1) Experimental Collaboratories (SEN-EC) to facilitate collaborative multi-institution experiments, (2) Education and Data Standards (SEN-ED) initiative to develop and spread best practices for performing and documenting research work and (3) a Knowledge Base (SEN-KB) to share experimental methods and datasets.

SERC 
Smithsonian Environmental Research Center 
https://serc.si.edu/environmental-data 
Smithsonian researchers and citizen scientists collect data that spans both space and time. SERC has long-term datasets that track decades of environmental change, as well as plant and animal databases that cover the U.S. and beyond.

Sourceforge 
https://sourceforge.net/ 
Find, create, and publish open source software for free.

Symbiota 
http://symbiota.org/docs/ 
An open source software framework that strives to integrate biological community knowledge and data to synthesize a network of databases and tools that will aid in increasing our overall environmental comprehension.

TreeBASE 
University Navigation Satellite Timing and Ranging Consortium 
https://treebase.org/ 
A repository of phylogenetic information such as trees of species, trees of populations, trees of genes that represent all biotic taxa.

UNAVCO 
University Navigation Satellite Timing and Ranging Consortium 
http://www.unavco.org/ 
A non-profit university-governed consortium that facilitates geoscience research and education using geodesy. The repository database accepts GPS/GNSS and Imaging (SAR and TLS) data.

VertNet 
http://vertnet.org/ 
Allows discovering, capturing, and publishing of biodiversity data using the collaboration between hundreds of biocollections.

Vhub 
https://vhub.org/ 
A free online resource for collaboration in volcanology research and risk mitigation. VHub provides easy mechanisms for sharing tools to model volcanic processes and analyze volcano data, to share resources such as teaching materials and workshops, and to communicate with other members of the volcanology community.

WDC Paleo 
World Data Center for Paleoclimatology 
https://www.ncdc.noaa.gov/data-access/paleoclimatology-data/contributing 
A member of the World Data System, which houses a wide range of solar, geophysical, environmental, and human dimensions data.

WHOAS 
Woods Hole Open Access Server 
http://darchive.mblwhoilibrary.org/ 
Institutional repository for WHOI community.