OPTIMIZATION: Support for omitting specific graph objects from file storage #404
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Some entities and relationships do not need to be stored on the
file system at all. We only need to store graph objects on the file
system if we later intend to fetch the data from the job state or
iterate over the
_type
. This PR introduces support for specifyingmetadata in each step that can be used by the
FileSystemGraphObjectStore
that will be used to omit specific data from being written to disk
entirely. The data will still get uploaded.
A longer term optimization would be actually leveraging the dependency graph to automatically generate this information. We can also make a follow-up improvement that removes the entire locking behavior during these cases because no step will rely on the unindexed data, so there is no reason to lock.