Confluence Retirement

Due to the feedback from stakeholders and our commitment to not adversely impact USGS science activities that Confluence supports, we are extending the migration deadline to January 2023.

In an effort to consolidate USGS hosted Wikis, myUSGS’ Confluence service is targeted for retirement. The official USGS Wiki and collaboration space is now SharePoint. Please migrate existing spaces and content to the SharePoint platform and remove it from Confluence at your earliest convenience. If you need any additional information or have any concerns about this change, please contact myusgs@usgs.gov. Thank you for your prompt attention to this matter.

Proposed Comparison and Standardization of Accumulation Results
July 20, 2015 [WDJ1]

 

 

Accuracy:

Proposed Testing

  • Initial testing should consist of verification of full upstream watershed areas.
  • Precipitation. -Differences in results can crop up in the data because of differences in approaches used to summarize spatial layers at several point in the process. We propose also comparing accumulation results of a shared, common dataset for each summarization type.

Summaries to be tested include maximum upstream value, minimum upstream value, area-weighted average, and upstream sum [WDJ2]

EPA has included a raster (precip.tif) that was created by HSC. This dataset was used by the HSC team to create the accumulated precipitation values distributed with the NHDPlusV2 and was developed by combining 30-yr. PRISM precipitation normals with Mexican and Canadian climate data. The html file, PrecipitationQACheck.html, provides an example of how we compared our results to those of NHDPlusV2 and we propose using this approach to compare [WDJ3] results among groups.

 

Notes from Previous Work:

EPA:  We have found that full upstream watershed areas are the best way to check accumulation accuracy. It is possible to get very similar results when comparing precipitation accumulations, for example, for the wrong reasons because of spatial autocorrelation of rain patterns. We compared our accumulated areas against those distributed by the NHDPlusV2. Through these comparisons, we were able to diagnose several problems in our accumulation process, especially at inter-HydroRegion borders. In addition, we identified errors in the NHDPlusV2 accumulations that we were able to bring to the attention of Cindy McKay of Horizon Systems Corp (HSC). The file AccumulationsChecks.html shows and example of how we compared our watershed areas to those of the NHDPlusV2.

USGS AGAP/NFHP:   We too have found that full upstream watershed areas are the best way to check accumulation accuracy.  In previous work we used the upstream watershed areas to verify results from our accumulation as well.  Similar to the EPA group we were able to fine tune our code and also found a number of COMIDs within the NHDPlusV1 with incorrect network areas, mostly due to braided networks.  After finding incorrect values in the NHDPlusV1 we also performed a number of “manual” spot-checks to ensure our accuracy.  We then used data from the NLCD to further verify our calculations.  Since our group discussions have occurred we have updated our code and database to run on the NHDPlusV2 and have started verification using the upstream area values.

USGS NAQWA:

 

Speed:

  Proposed Testing:

  • We propose testing the different tools in at least 2 environments. Daniel Wieferich has proposed using the USGS High Performance Cluster (HPC) because it will provide a fast efficient way for everyone to access and run the final code. This test will compare the speed of each approach, but will also verify that each approach is compatible with the USGS HPC. Second, we propose running each code set on a non-HPC computer to estimate processing times that someone with access to more standard computing hardware may experience.
  • To compare speeds, we propose that the total time to process precip.tif for the conterminous USA be recorded for each code set [WDJ4] .
    • Include details for this test here.  Number of variables being processed in the timed batch. Type of calculation (e.g. sum, min, max, area weighted average?)
    •  

 

Methodology/Details:

Proposed Comparison/Testing:

  • Compare and discuss pros/cons of including the target local catchment in the summarization
    • AGAP/NFHP calculated network accumulations that include the target local catchment in the calculation.  We recommend keeping the target catchment in the calculation.  With that said, although most of our partners have seemed content with these decisions some of our partners argue that the network accumulation should not include the target local catchment in the calculation, yet instead focus on the upstream influence, so we are willing to discuss this further.
  • Downstream aggregations of information
    • AGAP/NFHP suggests focusing on upstream accumulation right now but has received requests of downstream calculations for partners in coastal regions (e.g. number of dams downstream).  Although this information is not top priority it would be interesting to hear thoughts and past experiences from others.  We have tested our code-set on the NHDPlusV1, but ran into issues and did not have time to fully investigate.
  • Address all braids vs. Mainstem accumulation
    • Is this something we want to think about?  We all look to include all braids I believe.

Ease of use and access:

Should we compare how easy each tool is to use [WDJ5] ?

 

 


[WDJ1] This document mostly references accumulation .  The attribution process will need to be addressed eventually.

[WDJ2] Please add additional summary calculations as needed.

[WDJ3] I think this is a good approach but I would recommend we compare results for datasets that are using different types of summaries as well.  

[WDJ4] Should we specify the number of variables being used?  I think we should run at least a dozen variables through at one time and this number should be consistent across the

[WDJ5] My opinion is that no matter what, this work should result in an SOP or similar documentation for our final process so the complexity should be a lower priority, unless it proves to add additional manual processing time