Discovery Files

Inexpensive monitoring process powered by machine learning could aid in water treatment

Prediction tool provides support for chlorine-based disinfection

Small, rural drinking water treatment plants typically use only chlorine to implement the disinfection process. A key performance measure for disinfection is free chlorine residual, the concentration of free chlorine remaining in the water after the chlorine has oxidized the target contaminants. Plant operators choose a dose of chlorine to achieve a satisfactory concentration but must often estimate chlorine requirements.

The challenge of determining an accurate concentration of chlorine has led to the use of advanced prediction techniques, including machine learning. By identifying correlations among numerous variables in complex systems, machine learning could be used to accurately predict free chlorine residual, even from cost-effective, low-tech monitoring data.

Researchers at Georgia Tech and other institutions implemented a machine learning model to predict free chlorine residual. The model uses gradient boosting algorithms to accumulate decision trees to generate prediction. Data were collected from a water treatment plant in Georgia and included a wide variety of monitoring records and operational process parameters — variables that affect the quality, efficiency and cost of production. The work, supported in part by two grants from the U.S. National Science Foundation, is published in Frontiers of Environmental Science & Engineering.

The research team developed four iterations of a generalized modeling approach and applied open-source software to interpret machine learning models with many input parameters, allowing users to visually understand how each parameter affects prediction.  

The fourth and final iteration considered only intuitive, physical relationships and water quality measured downstream from filtration. The team identified three key findings: 1) With enough related input parameters, machine learning models can produce accurate prediction results; 2) machine learning models can be driven by correlations that may or may not have a physical basis; and 3) machine learning models can be analogous to operator experience. 

The research team suggests that future studies should explore expanding the applicability domain. For example, the data set that they analyzed was limited to only one full year. Therefore, greater data availability is expected to broaden the applicability domain and improve predictivity.