Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cached file management #434

Merged
merged 4 commits into from
Nov 15, 2024
Merged

Cached file management #434

merged 4 commits into from
Nov 15, 2024

Conversation

mkaruza
Copy link
Collaborator

@mkaruza mkaruza commented Nov 14, 2024

For each cached file we will now create additional metadata file that contains simple metadata. Added funcitons duckdb.cache_info() and duckdb.cache_delete(cache_key TEXT) that are used for managing cached files.

Write metadata for each cached http file once the file is downloaded
completely. This metadata file contains basic information that would
help us to have some management of cached files in system.
Added `duckdb.cache_info()` and `duckdb.cache_delete(cache_key TEXT)`
functions can be used to manage cached files.
@mkaruza mkaruza requested a review from wuputah November 14, 2024 09:20
Copy link
Collaborator

@wuputah wuputah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to add tests to this.

The code looks pretty straightforward but if @JelteF or @Y-- could have a quick look for particulars it would be appreciated.

CREATE TYPE duckdb.cache_info AS (
cache_key TEXT,
remote_path TEXT,
file_size TEXT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add the file's mtime here? it would be useful for clearing out older versions of files, basic LRU functions, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added timestamp cache creation time.

src/pgduckdb_options.cpp Outdated Show resolved Hide resolved
src/pgduckdb_options.cpp Outdated Show resolved Hide resolved
src/pgduckdb_options.cpp Show resolved Hide resolved
src/pgduckdb_options.cpp Show resolved Hide resolved
src/pgduckdb_options.cpp Outdated Show resolved Hide resolved
src/pgduckdb_options.cpp Outdated Show resolved Hide resolved
src/pgduckdb_options.cpp Outdated Show resolved Hide resolved
src/pgduckdb_options.cpp Outdated Show resolved Hide resolved
src/pgduckdb_options.cpp Outdated Show resolved Hide resolved
Comment on lines 56 to 57
string metadata_info = cache_key + "," + remote_path + "," + std::to_string(total_size);
handle->Write((void *)metadata_info.c_str(), metadata_info.length(), 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't Write take a const char*?

Suggested change
string metadata_info = cache_key + "," + remote_path + "," + std::to_string(total_size);
handle->Write((void *)metadata_info.c_str(), metadata_info.length(), 0);
handle->Write(cache_key.c_str(), cache_key.length(), 0);
handle->Write(",", 1, 0);
handle->Write(remote_path.c_str(), remote_path.length(), 0);
handle->Write(",", 1, 0);
auto s = std::to_string(total_size);
handle->Write(s.c_str(), s.length(), 0);

I'm also surprised we don't need a new line after?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write complains unless it is void * so we need to cast. Also, writing commas will also fail.
IMO, constructing single metadata line is simpler.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fair enough

@mkaruza mkaruza requested a review from Y-- November 15, 2024 10:36
Copy link
Collaborator

@Y-- Y-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the extra iteration!

@mkaruza mkaruza merged commit 0eb61d7 into main Nov 15, 2024
5 checks passed
@mkaruza mkaruza deleted the cached-file-metadata branch November 15, 2024 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants