Under the leadership of Dr. Qiang and Dr. Landry, I worked on this project using statistical methods and tools to monitor water quality in five Florida nature preserves. One of my main tasks was to perform python program development so that statistical analyses could be performed in a more efficient manner. This original markdown file is written by Dr. Qiang. SEACAR_WQ_Pilot
The exploratory analyses include:
Maps are created to show the spatial distribution of the WQ samples.
Analyses in Task 1a are shared in:
Ordinary least square regression (OLS) and Pearson correlation analyses have been conducted to examine the relations between the potential covariates and water quality parameters. The analyses have been conducted with data from 2016 to 2018 in all managed areas and in separate managed areas. The general procedure is:
Regression and correlation analysis are documented in:
The following interpolation methods are selected for evaluation:
The interpolation programs call functions from ArcGIS python interface (arcpy). The performance of these are evaluated through cross-validation. The purpose is to select the best performed method for batch production. The following metrics were derived to evaluate model performance:
Performance evaluation of interpolation methods are documented in:
Regression Kriging is applied to interpolate water quality parameters (Dissolved Oxygen, Total Nitrogen, Salinity, Secchi Depth, and Turbidity) in all seasons defined in this table for all managed areas. The interpolation algorithm utilizes the optimal combination of covariates identified in Task 1b.2.
Automate_Interpolation.ipynb: The main function that calls the interpolation function in autointerpolation.py. The program loads preprocessed data to save computing time.
autointerpolation.py: The interpolation function that can be applied to selected data points. The interpolation function is used in the main function to interpolate maps in all seasons.
Kernel density estimation (KDE) maps and aggregated standard error of prediction (SEP) maps are created in pairs for visual detection of sampling gaps and redundancies. The KDE and SEP maps are created from all data points from 2015 to 2019 for each parameter and in each managed area.
Gap_Analysis_Part1.md: Pairs of KDE and SEP maps for all sampling points from 2015 to 2019. The executable Python codes to generate these maps can be found here
Kernel density estimation (KDE) maps and aggregated standard error of prediction (SEP) maps are created for spring, summer, fall and winter from 2015 to 2019 for each parameter and in each managed area.
Gap_Analysis_Part2.md: Pairs of seasonal KDE and SEP maps. The executable Python codes to generate these maps can be found here
The KDE and SEP maps generated in Task 1c.1 are reclassified into low, neutral and high using 25 and 75 percentile thresholds. Then, the KDE and SEP maps are overlaid to identify redundant sampling points and gap areas according to the table below.
Kernel density | Standard error of prediction | Implication | Output |
---|---|---|---|
High | High | Natural variation, potentially seasonal issue, or might be unexplained variation | Display seasonal maps for SEACAR team to consider explanation |
High | Low | Potential redundancy | Identify specific sampling points within these areas |
Low | Low | No change needed / low priority | Reference only |
Low | High | Potential need for more stations | Identify areas on the map |