Improve performance of type map creation #376

amarshall · 2021-04-08T00:05:38Z

Please see individual commits for extensive details. I’m happy to separate into multiple PRs if some work needs more vetting or discussion, as each commit can stand on its own (but separating will require resolving conflicts as commits 2 & 3 both depend on 1).

This is the result of investigation after encountering some performance issues with type map initialization being unexpectedly slow (after implementing a more time-sensitive use case) when the number of rows queries from pg_type is quite large (~131k in production). The enclosed performance testing was done on in an ideal (and faster) development environment with similar schema and slightly large pg_type set (~189k). In our production environment we were seeing type map creation take 4,102±550 ms, and since we create two type maps, that time doubles currently. Needless to say, that is quite slow and these changes aim to improve that.

Finally, I am curious if it is possible to reduce the number of rows which are actually used to generate the coder maps, but I’m not certain if all of there are truly needed. It appears that every table with an array column gets two rows (one array_in, one record_in); since we have many tables this ends up summing to quite a lot.

Usage of the range data was previously commented-out but was removed entirely in 365008e.

Testing methodology: - 189k rows returned from pg_type query in `build_coder_maps`. - Timing calling `PG::BasicTypeMapForQueries.new` only, figures are average of 16 iterations. runtime objects allocations Baseline : 2,027±350 ms 909 k 609 MiB Optimized: 874± 70 ms (-56%) 113 k (-88%) 113 MiB (-79%) Unfortunately performing the actual SQL query cannot easily be removed, so there is a fair bit of variance in the runtimes, however, results still show substantial improvement even when considering the worse case. If I had more time I would time the query and remove it, but that requires a bit more hacking than I would prefer, and, as said, the improvement stands either way.

Building coder maps can be quite expensive, building them ahead-of-time allows them to be reused for multiple type map creations, e.g. if creating both BasicTypeMapForQueries and BasicTypeMapForResults the cost need only be paid once. Example usage: conn = PG.connect(…) coder_maps = BasicTypeRegistry.build_coder_maps(conn) tmq = BasicTypeMapForQueries.new(conn, coder_maps: coder_maps) tmr = BasicTypeMapForResults.new(conn, coder_maps: coder_maps) Testing methodology: - 189k rows returned from pg_type query in `build_coder_maps`. - Cached and injected results from query to remove query cost and jitter from baseline. - Timing calling `PG::BasicTypeMapForQueries.new` only, figures are average of 32 iterations. - “Cached” calls `build_coder_maps` ahead-of-time, outside the timing loop, and passes-in. runtime Baseline: 374±16 ms Cached : <1± 0 ms (-99%) This confirms that the majority of the cost in creating a new type map is, when ignoring SQL query and results materialization, enclosed within `build_coder_maps`. As such, for *n* type maps, this reduces the (sometimes non-small) time complexity effectively from O(n) → O(1).

amarshall · 2021-07-19T22:54:36Z

@larskanis Any thoughts on this?

This way one CoderMapsBundle can be used to build several BasicTypeMap instances. So only one database query on pg_types is executed then. Follow up to #376

amarshall added 3 commits April 7, 2021 19:55

Do not query for unused data in BasicTypeMapping

ed95202

Usage of the range data was previously commented-out but was removed entirely in 365008e.

larskanis merged commit 7fe1eba into ged:master Jul 29, 2021

larskanis added a commit that referenced this pull request Jul 30, 2021

Add CoderMapsBundle for building BasicTypeMap instances

4c6c6b2

This way one CoderMapsBundle can be used to build several BasicTypeMap instances. So only one database query on pg_types is executed then. Follow up to #376

amarshall deleted the perf branch November 19, 2022 01:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of type map creation #376

Improve performance of type map creation #376

amarshall commented Apr 8, 2021

amarshall commented Jul 19, 2021

Improve performance of type map creation #376

Improve performance of type map creation #376

Conversation

amarshall commented Apr 8, 2021

amarshall commented Jul 19, 2021