Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String deduplication #1269

Merged
merged 6 commits into from
Sep 20, 2023
Merged

Conversation

ljeub-pometry
Copy link
Collaborator

What changes were proposed in this pull request?

  • Store all strings as immutable Arc to make cloning of properties cheap
  • Implement a per-graph string pool to avoid storing duplicate string values and instead only store pointers to the same underlying string

Why are the changes needed?

  • memory optimisation

Does this PR introduce any user-facing change? If yes is this documented?

no, changes are completely transparent to the user-facing apis

How was this patch tested?

existing tests still work and added a test to check the deduplication is effective

Copy link
Collaborator

@fabianmurariu fabianmurariu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the serialisation memory issue, which comes with it's own set of problems this LGTM

@Haaroon
Copy link
Contributor

Haaroon commented Sep 18, 2023

I too approve, with large files i have seen a reduction of 15G in ram, a previous file set (netflow and logs on day2) i saw 55G of raphtory ram usage, now with these changes the graph uses 40.7G of ram

@ljeub-pometry ljeub-pometry merged this pull request into feature/MakePropertiesTyped Sep 20, 2023
@ljeub-pometry ljeub-pometry deleted the feature/ArcStr branch September 20, 2023 11:10
ljeub-pometry added a commit that referenced this pull request Sep 20, 2023
* eliminate a lot of arc clones

* replace String by ArcStr (wrapped Arc<str>) for cheap clone and to make it possible to support string deduplication

* test string deduplication

* implement string deduplication for property values

* clean up warnings

* expose meta data in core ops and minor cleanup
miratepuffin pushed a commit that referenced this pull request Sep 20, 2023
* take property insertion apart and put it back together again

* fix tests that were testing broken behaviour

* remove `"_id"` from properties and change stray ints to floats in python tests

* fix warnings

* String deduplication (#1269)

* eliminate a lot of arc clones

* replace String by ArcStr (wrapped Arc<str>) for cheap clone and to make it possible to support string deduplication

* test string deduplication

* implement string deduplication for property values

* clean up warnings

* expose meta data in core ops and minor cleanup

* fix rebase issues and clean up warnings

* dubious warning fix

* attribute does not work, warning is still there

* simplify edge addition and deletion

* No more spin-locking for adding edges (instead get locks in consistent order)
fabianmurariu pushed a commit that referenced this pull request May 21, 2024
* take property insertion apart and put it back together again

* fix tests that were testing broken behaviour

* remove `"_id"` from properties and change stray ints to floats in python tests

* fix warnings

* String deduplication (#1269)

* eliminate a lot of arc clones

* replace String by ArcStr (wrapped Arc<str>) for cheap clone and to make it possible to support string deduplication

* test string deduplication

* implement string deduplication for property values

* clean up warnings

* expose meta data in core ops and minor cleanup

* fix rebase issues and clean up warnings

* dubious warning fix

* attribute does not work, warning is still there

* simplify edge addition and deletion

* No more spin-locking for adding edges (instead get locks in consistent order)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants