STORMODB Database Performance Improvement Strategy:
There are a number of issues that result in some of the Portal test queries running for multiple minutes on STORMODB. (For test query script and results, see https://my.usgs.gov/confluence/display/qwdp/13+September+2011+EPA+and+CIDA). The most difficult of these issues is that some of the queries show a different execution plan when compared to the CIDA version of the database where the test queries run much faster.
There are several differences between the environments that result in these execution plan problems and other performance issues:
- Compression differences. The compression on STORMODB results in FA_REGULAR_RESULT being 32 Gb, while it is less than half that on the CIDA version. Not sure what would cause that. It is a very significant difference that causes a 2X disk-read difference on some queries and may also affect optimizer choices.
- Partitioning differences. This is the main difference affecting test queries #5 and #9, resulting in queries of 2-6 minutes in STORMODB. These typically run in about 1/5 of the time at CIDA, due to one-year partitions in FA_REGULAR_RESULT.
- Index differences. There are many indexes present on the STORMODB version that are not present on the CIDA version. Occasionally these other indexes are chosen for portal queries and result in poorly-performing queries.
- "Analyze" differences. We may need to look at how statistics are gathered.
The "execution plan" differences may go away if the compression, partitioning, indexing and analyze differences are addressed. Another promising approach would be to build a "custom" copy of certain artifacts within STORET and configure them to support the Portal. This would essentially be a Portal DataMart that would look something like the CIDA version, but would be managed by the EPA's ETL process and remain under EPA stewardship. This approach would need some analysis to determine the impact to current STORET operations and storage resources as well as the time required to stand it up.
CIDA and EPA staff will work together in the coming months up to April 2012 to improve Portal test query times on STORMODB to come within an acceptable tolerance of the benchmark query times from the CIDA version. The following strategy could be applied and tested on STORMODB to track any resulting performance gains (or losses). It is recommended that the issues regarding compression and partitioning be resolved before further investigating the explain plan and index differences.
- Fix Compression: CIDA and EPA will try to resolve the differences in compression between STORMODB and the CIDA version.
- Enable Partitioning: CIDA and EPA will implement and analyze the effects of applying one-year partitioning to FA_REGULAR_RESULT on STORMODB
- Run Explain Plans: CIDA will run and share with EPA the explain plans from STORMODB and from the CIDA version for the problematic test queries. These will be used for comparison as different strategies are tested.
- Evaluate Indexing: CIDA will provide EPA with a list of indexes needed for Portal queries. CIDA and EPA will look into the possibility of making indexes invisible/visible to the Portal user, or potentially removing unnecessary indexes from STORMODB.
- Analyze Statistics: CIDA and EPA will look into possible ‘Analyze’ differences between STORMODB and the CIDA version.