Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More index optimisations #923

Merged
merged 20 commits into from
Feb 26, 2024
Merged

More index optimisations #923

merged 20 commits into from
Feb 26, 2024

Conversation

shawnlaffan
Copy link
Owner

This PR includes several optimisations. Chief among them are:

  • Implement _calc_abc_any for cases where a depending method only needs the label hash keys or element lists
  • calc_abc grabs and modifies results from calc_abc2 or calc_abc3 if already run, thus saving some processing
  • calc_abc2 or calc_abc3 are always run before calc_abc
  • add a hierarchical _calc_abc variant for cases such as cluster node calcs where the label hashes can be generated from the child node results instead of processing the full list of terminals
  • some other general index optimisations such as result sharing when nbr set 2 is empty

This commit special cases the dependency calcs so at least
one calc_abc sub is always run first when _calc_abc_any
is a dependency.

Many indices only need the abc counts or the
keys of the label hashes.  These can use any of
calc_abc, calc_abc2 and calc_abc3.

Recent commits added the capacity to grab any local
precalc result, and this can be used to support
a calc_abc_any sub.
It needs hash values of 1, which abc2 and abc3
do not usually provide.
If calc_abc2 or calc_abc3 have already been calculated
then we can just grab their results and set the label hashes
to have values of 1.

This avoids a lot of looping in several circumstances,
for example where large neighbour sets in a spatial
analysis or where cluster indices are calculated per node.
This will save some processing given calc_abc
can now adapt their results.
Caches can be deleted with impunity so might be
interfered with in the middle of processing.
If we have no second neighbour set then
there is no need to run all the hash processing.
If labels_hash2 is empty then the central and whole
variants are the same.  So just grab them and remap
the keys.
This will avoid a lot of processing when both
are being calculated as the results will be
the same in such cases.
No need to go recalculate everything in these cases.
This avoids a lot of extra computation for
calculations on cluster nodes as internal nodes
can combine their child node results instead of
iterating over the all the terminal elements.
@shawnlaffan shawnlaffan merged commit 8216111 into master Feb 26, 2024
8 checks passed
@shawnlaffan shawnlaffan deleted the calc_abc_any branch February 26, 2024 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant