Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H-3291: Implement a pre-processor for entities and infer data type IDs #5006

Merged

Conversation

TimDiekmann
Copy link
Member

🌟 What is the purpose of this PR?

The graph eventually needs to change entities before they are inserted into the database. Initially, this required for two cases:

  • Store the data type ID if it is inferred (a temporary workaround until we always pass it)
  • Calculate the canonical value for a value

🔍 What does this change?

  • Introduce EntityVisitor - a mutating visitor which recursively visits all properties
    • The name may change to distinguish between a mutable and an immutable visitor, currently, only the mutable version is implemented
    • It defaults to the corresponding walk_* functions, which implement traversal logic and validate the entity shape
  • Add an EntityPreprocessor which implements EntityVisitor
    • Does the same property validation logic as the previous entity validator
    • Fails, if a data type ID is not set and cannot be inferred. If it can be inferred, it's stored on the entity metadata. This implies that all entities will now have data type IDs stored alongside values in the database - a very important step towards data-type based queries
  • Remove the old property validation logic (it's covered in EntityPreprocessor
  • Use the `EntityPreprocessor in
    • entity creation
    • entity patching
    • snapshot restoration

Pre-Merge Checklist 🚀

🚢 Has this modified a publishable library?

This PR:

  • does not modify any publishable blocks or libraries, or modifications do not need publishing

📜 Does this require a change to the docs?

The changes in this PR:

  • are internal and do not require a docs change

🕸️ Does this require a change to the Turbo Graph?

The changes in this PR:

  • do not affect the execution graph

⚠️ Known issues

  • Error handling currently enforces a single error enum. This needs to be improved
  • The EntityPreprocessor is defined in the validation crate but also does infer data type IDs, this might be confusing and should be split.

🛡 What tests cover this?

  • The integration test suite checks for the data type ID being set after none were provided

@TimDiekmann TimDiekmann self-assigned this Sep 6, 2024
@github-actions github-actions bot added area/deps Relates to third-party dependencies (area) area/apps > hash* Affects HASH (a `hash-*` app) area/libs Relates to first-party libraries/crates/packages (area) type/eng > backend Owned by the @backend team area/tests New or updated tests area/tests > integration New or updated integration tests area/apps area/apps > hash-graph labels Sep 6, 2024
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from data_type.rs and removed non-test code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from property_type.rs and removed non-test code

…and-entity-metadata-before

# Conflicts:
#	libs/@local/hash-validation/src/property_type.rs
vilkinsons
vilkinsons previously approved these changes Sep 9, 2024
…and-entity-metadata-before

# Conflicts:
#	libs/@local/hash-validation/src/property_type.rs
Copy link
Contributor

github-actions bot commented Sep 9, 2024

Benchmark results

@rust/graph-benches – Integrations

scaling_read_entity_complete_one_depth

Function Value Mean Flame graphs
entity_by_id 10 entities $$31.3 \mathrm{ms} \pm 158 \mathrm{μs}\left({\color{lightgreen}-35.161 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 50 entities $$273 \mathrm{ms} \pm 1.71 \mathrm{ms}\left({\color{lightgreen}-82.390 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 1 entities $$20.2 \mathrm{ms} \pm 92.4 \mathrm{μs}\left({\color{gray}0.385 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 25 entities $$74.0 \mathrm{ms} \pm 533 \mathrm{μs}\left({\color{gray}-2.737 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 5 entities $$25.5 \mathrm{ms} \pm 325 \mathrm{μs}\left({\color{gray}2.20 \mathrm{\%}}\right) $$ Flame Graph

scaling_read_entity_complete_zero_depth

Function Value Mean Flame graphs
entity_by_id 10 entities $$2.09 \mathrm{ms} \pm 11.4 \mathrm{μs}\left({\color{gray}3.54 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 50 entities $$4.14 \mathrm{ms} \pm 25.6 \mathrm{μs}\left({\color{red}5.81 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 1 entities $$1.88 \mathrm{ms} \pm 10.4 \mathrm{μs}\left({\color{gray}1.67 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 25 entities $$2.71 \mathrm{ms} \pm 93.1 \mathrm{μs}\left({\color{gray}3.76 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 5 entities $$1.91 \mathrm{ms} \pm 10.5 \mathrm{μs}\left({\color{gray}-0.896 \mathrm{\%}}\right) $$ Flame Graph

scaling_read_entity_linkless

Function Value Mean Flame graphs
entity_by_id 10 entities $$1.86 \mathrm{ms} \pm 7.43 \mathrm{μs}\left({\color{gray}-0.593 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 100 entities $$2.01 \mathrm{ms} \pm 10.2 \mathrm{μs}\left({\color{gray}-1.041 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 1000 entities $$2.73 \mathrm{ms} \pm 11.8 \mathrm{μs}\left({\color{gray}-4.616 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 1 entities $$1.90 \mathrm{ms} \pm 11.2 \mathrm{μs}\left({\color{gray}2.74 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 10000 entities $$13.0 \mathrm{ms} \pm 136 \mathrm{μs}\left({\color{gray}-2.979 \mathrm{\%}}\right) $$ Flame Graph

representative_read_entity_type

Function Value Mean Flame graphs
get_entity_type_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579 $$1.44 \mathrm{ms} \pm 10.1 \mathrm{μs}\left({\color{gray}2.11 \mathrm{\%}}\right) $$ Flame Graph

representative_read_entity

Function Value Mean Flame graphs
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1 $$16.2 \mathrm{ms} \pm 192 \mathrm{μs}\left({\color{lightgreen}-35.658 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1 $$16.2 \mathrm{ms} \pm 200 \mathrm{μs}\left({\color{gray}-3.233 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1 $$16.4 \mathrm{ms} \pm 194 \mathrm{μs}\left({\color{lightgreen}-35.401 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1 $$16.3 \mathrm{ms} \pm 225 \mathrm{μs}\left({\color{red}13.3 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1 $$15.5 \mathrm{ms} \pm 214 \mathrm{μs}\left({\color{gray}-1.153 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1 $$15.4 \mathrm{ms} \pm 182 \mathrm{μs}\left({\color{lightgreen}-6.454 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1 $$15.6 \mathrm{ms} \pm 177 \mathrm{μs}\left({\color{lightgreen}-36.486 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2 $$16.6 \mathrm{ms} \pm 204 \mathrm{μs}\left({\color{lightgreen}-26.672 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1 $$16.2 \mathrm{ms} \pm 197 \mathrm{μs}\left({\color{gray}-4.004 \mathrm{\%}}\right) $$ Flame Graph

representative_read_multiple_entities

Function Value Mean Flame graphs
link_by_source_by_property depths: DT=2, PT=2, ET=2, E=2 $$95.6 \mathrm{ms} \pm 459 \mathrm{μs}\left({\color{gray}1.96 \mathrm{\%}}\right) $$ Flame Graph
link_by_source_by_property depths: DT=0, PT=0, ET=0, E=2 $$76.3 \mathrm{ms} \pm 459 \mathrm{μs}\left({\color{gray}1.18 \mathrm{\%}}\right) $$ Flame Graph
link_by_source_by_property depths: DT=0, PT=0, ET=0, E=0 $$39.5 \mathrm{ms} \pm 233 \mathrm{μs}\left({\color{gray}0.455 \mathrm{\%}}\right) $$ Flame Graph
link_by_source_by_property depths: DT=0, PT=0, ET=2, E=2 $$86.6 \mathrm{ms} \pm 687 \mathrm{μs}\left({\color{gray}1.36 \mathrm{\%}}\right) $$ Flame Graph
link_by_source_by_property depths: DT=0, PT=2, ET=2, E=2 $$90.9 \mathrm{ms} \pm 488 \mathrm{μs}\left({\color{gray}0.501 \mathrm{\%}}\right) $$ Flame Graph
link_by_source_by_property depths: DT=255, PT=255, ET=255, E=255 $$104 \mathrm{ms} \pm 710 \mathrm{μs}\left({\color{gray}-1.785 \mathrm{\%}}\right) $$ Flame Graph
entity_by_property depths: DT=2, PT=2, ET=2, E=2 $$55.9 \mathrm{ms} \pm 461 \mathrm{μs}\left({\color{gray}1.22 \mathrm{\%}}\right) $$ Flame Graph
entity_by_property depths: DT=0, PT=0, ET=0, E=2 $$40.8 \mathrm{ms} \pm 235 \mathrm{μs}\left({\color{gray}0.198 \mathrm{\%}}\right) $$ Flame Graph
entity_by_property depths: DT=0, PT=0, ET=0, E=0 $$37.4 \mathrm{ms} \pm 230 \mathrm{μs}\left({\color{gray}2.06 \mathrm{\%}}\right) $$ Flame Graph
entity_by_property depths: DT=0, PT=0, ET=2, E=2 $$47.9 \mathrm{ms} \pm 286 \mathrm{μs}\left({\color{gray}2.37 \mathrm{\%}}\right) $$ Flame Graph
entity_by_property depths: DT=0, PT=2, ET=2, E=2 $$52.1 \mathrm{ms} \pm 276 \mathrm{μs}\left({\color{gray}0.673 \mathrm{\%}}\right) $$ Flame Graph
entity_by_property depths: DT=255, PT=255, ET=255, E=255 $$65.1 \mathrm{ms} \pm 326 \mathrm{μs}\left({\color{gray}2.10 \mathrm{\%}}\right) $$ Flame Graph

Merged via the queue into main with commit 1cb6fcd Sep 9, 2024
158 of 159 checks passed
@TimDiekmann TimDiekmann deleted the t/h-3291-allow-changing-entities-and-entity-metadata-before branch September 9, 2024 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/apps > hash* Affects HASH (a `hash-*` app) area/apps > hash-graph area/apps area/deps Relates to third-party dependencies (area) area/libs Relates to first-party libraries/crates/packages (area) area/tests > integration New or updated integration tests area/tests New or updated tests type/eng > backend Owned by the @backend team
Development

Successfully merging this pull request may close these issues.

2 participants