Confronting models with observations
A hurricane rages off the coast of Florida. Planes fly into the eye of the storm, capturing details about the speed and structure of the hurricane and beaming this data back to headquarters.
On the coast, sensors draw in data on wave heights. Satellites image inundated neighborhoods. Twitter sentiment analysis tracks growing unease centered on a highway bottleneck. All of this information pours into a central control system and helps guide forecasts and shape evacuation plans.
This is the potential power of data-driven science.
We're not there yet, but increasingly access to streams of observational data is transforming many branches of science. From sequencers and satellites to telescopes and tele-operated drone swarms, massive amounts of data are being collected in ever-new ways. This is offering a means to test and improve existing models that had been developed over decades.
Scientists use predictive models because they cannot experiment with the future climate or a city's highways in a laboratory, the way they do with chemical reactions or cell cultures. But these models are not without errors and a certain degree of uncertainty. Increasingly, researchers are finding that incorporating real-time data into the model can improve the predictions that models give, in a process called data assimilation.
"This is a common problem, almost a generic problem in science," said Jeffrey Anderson, a senior scientist at the National Center for Atmospheric Research (NCAR) in Boulder, Colorado. "You observe a physical system and then you try to model it, and to do that, somehow you have to relate your model with your observations.
"At the end of the day, the scientific method is about prediction, and data assimilation is this core piece of the scientific method that sort of got ignored for a long time. We really view data assimilation as the tools for confronting models with observations."
Data assimilation for all
Typically, data is analyzed to determine the fundamental way a system--like ocean currents or tornadoes--operates and again to establish the initial conditions--the starting point from which a simulation begins. This assures that the models and simulations accurately reflect the best understanding of the science and the known conditions on the ground.
But such models have typically used static data. Today it's possible to incorporate dynamic, real-time data that offers even more assurance that a model or forecast is realistic.
Data assimilation had been a part of weather prediction since the 1970s, but it was difficult to develop and implement, and laborious to change. As a result, it was only used in the most important and intensive simulations--like official global weather predictions.
In the early 2000s, shortly after arriving at NCAR, Anderson--formerly a climate model developer--began thinking about ways to improve data assimilation and make it accessible to all scientists.
More data was becoming available every day. The scientific community needed a way of using it. But the only data assimilation methods available were entangled and inseparable from the codes that were used at the numerical weather prediction centers.
"I came to NCAR with the idea that we were at a time where, with proper software engineering techniques and the proper data assimilation algorithms, we could actually build a data assimilation system that could be used with any number of models and any number of observations," Anderson recalled.
He started a data assimilation research section at NCAR called DARES--a small, data-savvy team that helps Earth scientists incorporate data into their research.
"We really see ourselves as the unsexy member of this triumvirate of models, observations, and the data assimilation that puts the two together," Anderson said.
DARES became that community facility he had envisioned, with software, tools and documentation, plus people offering dedicated, hands-on support.
"We take very seriously NCAR's mission of supporting university scientists, providing them with the tools they need to move their research ahead," he said.
As part of their community-development work, they created a data assimilation tool, called DART (Data Assimilation Research Testbed), used by more than three-dozen large community codes and hundred of scientists in areas ranging from space debris prediction to ocean currents. Released in 2004, DART continues to evolve and grow.
In several recent journal papers, one can see the impact that DART and data assimilation in general are having on climate, weather and ocean modeling and diverse other research areas.
Below are a few "snapshots" of findings enabled by data assimilation.
Cosmic rays and soil moisture
Scientists are always coming up with new ways of sensing the environment and of adapting those sensors to perform useful functions in society. One such example is the use of cosmic-ray sensors--among them the NSF-funded COsmic-ray Soil Moisture Observing System (COSMOS) project led by the University of Arizona--to measure soil moisture dynamics at an unprecedented scale.
The sensor measures the number of neutrons at a particular energy level (called "fast neutrons") whose absorption is directly related to the amount of hydrogen in the soil. By removing the effect of additional sources of hydrogen, the sensor can measure soil moisture at between 12 and 76 centimeters (or up to two and a half feet), depending on the water content.
Rafael Rosolem, a lecturer in Water and Environment Engineering at the University of Bristol, has been assimilating measurements from the COSMOS network to improve the performance of land surface models.
"Our collaborative work with NCAR and the University of Arizona showed the benefits of employing data assimilation techniques, such as the suite of algorithms provided by the DART software, to improve simulations of soil moisture using novel technology such as the cosmic-ray sensors available from the COSMOS network," Rosolem said.
The work has implications for future efforts to improve the quality of weather and climate predictions, agriculture monitoring, flood forecasts and drought monitoring. The research was recently published in Hydrology and Earth System Science.
Snow water resources
Snow is an important, but not well-understood factor in global climate due to the lack of high-quality datasets.
Zong-Liang Yang, a professor of geoscience at The University of Texas at Austin, and his graduate student Yong-Fei Zhang have been using DART to improve the representation of snow in the land component of the Community Earth Systems Model--an earth system model composed of coupled atmosphere, ocean, land surface, sea ice, land ice and other models, used by the wider climate research community.
The work is part of a multi-institution effort led by UT-Austin, along with NCAR and NASA, focused on developing a global-scale multi-sensor snow data assimilation system.
"DART fits my group's goal of developing a flexible and extensible land data assimilation system," said Yang. "Besides our prototype snow data assimilation, DART is useful for data assimilation involving other variables, such as soil moisture, skin temperature, and leaf area index from various satellite sources and ground observations."
The results of Yang and his team's data assimilation effort were published recently in the Journal of Geophysics Research: Atmospheres.
Said Yang, "Such a truly multi-mission, multi-platform, multi-sensor, and multi-scale data assimilation system with DART will, ultimately, help constrain earth system models using all kinds of observations to improve their prediction skills."
Space debris
Solar radiation in the thermosphere (the layer of the Earth's atmosphere directly above the mesosphere and directly below the exosphere) significantly affects the drag experienced by objects like satellites and spacecraft in low-Earth orbit. The fact that the drag changes depending on several factors leads to uncertainty in the position of objects in orbit, which could result in the loss of a spacecraft.
One way of decreasing this uncertainty is by obtaining more precise estimates about the neutral density of the atmosphere from thermospheric models. And an effective way of improving the accuracy of these models is via data assimilation.
In a recent paper published in the Journal of Atmospheric and Solar-Terrestrial Physics, Alexey Morozov and colleagues from the University of Michigan showed that DART was able to improve the accuracy of the Global Ionosphere-Thermosphere Model (GITM) by assimilating measurements data from CHAMP (Challenging Minisatellite Payload), a German satellite used for atmospheric research.
In their experiments, Morozov and his team used DART's data assimilation and machine learning capabilities to fix holes in GITM and to eliminate a bias they were finding in some simulations.
"We had to get our hands wet at seeing if DART can do a simple thing--push a lever in the right direction to increase the density to match the CHAMP data," said Morozov, who now works at InvenSense, an intelligent sensor company. They found that in some cases, using DART reduced up to 70 percent of the bias from the model.
"The space weather research is one of a number of applications where we've let people do science where no one had been able to confront the models with observations before," said Anderson.
Assimilating the assimilators
Typically, when a researcher wants to add DART to their code, Anderson invites them to the NCAR campus for a week. There, he and his team work to understand how their code operates and determine how to incorporate DART so that the model--now using dynamic data--produces more accurate results.
There is a downside to data assimilation, however. Assimilating data into a simulation in a statistically-accurate way requires one to run a simulation many times (sometimes up to 60)--a process called ensemble forecasting.
"What we're trying to do is sample from a distribution of these ensemble and then make a forecast," Anderson said. This requires additional computing power, which, for already compute-hungry simulations, can be a challenge to find.
However, these extra runs don't only correct errors in the models. They also provide new information and allow scientists to ask different types of questions.
"Ensemble forecasting offers an opportunity to study the sensitivity of forecasts, for instance, to correlate bad weather in Oklahoma City with winds over New Mexico," he said, citing a recent study his team was involved in.
Another thing data assimilation can do is identify errors in a model or an observing device. When researches realize that their data isn't matching the model, it provides an opportunity to find systematic problems--say, an out-of-alignment observing satellite or a bug in a forecasting code.
Simulation and modeling are often referred to as the third pillar of science, after theory and experimentation. But some have suggested that data-driven approaches, like the projects powered by DART, are fast becoming a fourth pillar.
With sensors and computer processing getting cheaper and more ubiquitous every year, it's not hard to imagine a world where data is available to an even greater degree than today. With it will come a need to use this data to calibrate models and improve predictions, and, as shown by the recent papers, DART is one effective way to do so.
"We do all this complicated statistics that puts these pieces together to make forecasts, and that's really been hard for the community as a whole to convey the importance of," Anderson said. "But the rest of these pieces don't fly without good assimilation in the center."