Register in the client context a state to avoid reconnecting on Azure. #32

quentingodeau · 2024-02-01T20:24:57Z

This commit refactors the code (mostly moves it) to ease readability. It also add an Azure context that will be kept between the files access in a query. Previously if you were using a credential_chain provider when you query multiple files for each of them, we will have initiate a new connection and identify at the Azure AD (now Entra). Now this will be only performed once by query.

samansmink · 2024-02-01T20:34:21Z

hey @quentingodeau thanks a lot for the PR!

The CI failure for the distribution pipeline ci can be fixed by bumping the version of the reusable workflow duckdb/duckdb/.github/workflows/_extension_distribution.yml
in .github/workflows/MainDistributionPipeline.yml to latest master of duckdb (currently: 64785543f882f8c9b2f81ae71fa43eadf1653573)

edit: and the Linux Release one seems some upstream server thats dead, i just ran into the same err on another pr

quentingodeau · 2024-02-01T20:44:15Z

Oh I misunderstood your comment, I rollback!

quentingodeau · 2024-02-08T21:54:26Z

Hi @samansmink, does the change on the workflow are the one you ask me to do ? I saw in your fork that you have update a sha, it that what I was suppose to do ?

This commit refactors the code (mostly moves it) to ease readability. It also add an Azure context that will be kept between the files access in a query. Previously if you were using a credential_chain provider when you query multiple files for each of them, we will have initiate a new connection and identify at the Azure AD (now Entra). Now this will be only performed once by query.

quentingodeau · 2024-02-13T19:46:48Z

Hi again @samansmink,
for info I have update the PR with the changes that you have done in the main branch :)
regards,
Quentin

samansmink · 2024-02-13T19:54:47Z

hi @quentingodeau sorry for the delay on all your PRs, i've been on holiday last week and very busy with the 0.10 release this week. I'll review this one and the rest asap!

quentingodeau · 2024-02-13T19:57:22Z

No problem!
Take your time :)
I known that I have move a lot of things and sorry for that, but I saw that the way the extension/unit test was build I was causing issue when you add some headers (because the includes options where not propagate with CMake)

samansmink

Hey @quentingodeau thanks a lot for the PR, caching the BlobServiceClient for the duration of the query seems like a great idea!

Scanning a ~200 MB parquet file on a decently fast network I got a speedup from ~11.5s to ~9.4s with this PR. I would expect the difference to be even bigger for higher latency networks

I added 2 minor comments, the rest looks good!

samansmink · 2024-02-16T13:41:20Z

src/azure_filesystem.cpp

+	AzureContextState *result = nullptr;
+
+	auto &storage_account = client_context->registered_state[DEFAULT_AZURE_STORAGE_ACCOUNT];
+	try {


Im not a big fan of the try-catch statement here. Could you rewrite this to not use one? I think something like:

auto lookup = client_context->registered_state.find(DEFAULT_AZURE_STORAGE_ACCOUNT); if (lookup == client_context->registered_state.end()) { // Create one and insert } else { // Refresh if invalid, otherwise return }

is a lot cleaner, even though it might require 2 lookups in the registered_state map for AzureContextState creation.

Yes I wanted to avoid the two lookup, but you are right it an unordered map so it should be almost like O(1) complexity. I kept some old habit when only map was available is the std^^
Done in ec242c0

samansmink · 2024-02-16T14:11:03Z

src/azure_filesystem.cpp

+	return {container, prefix, path};
+}
+
+std::shared_ptr<AzureContextState> AzureStorageFileSystem::GetOrCreateStorageAccountContext(FileOpener *opener,


Can we may be add an option for azure to enable/disable the caching feature? I think that would be nice to have as a workaround in case this causes issues for people.

Make sens done in ba56a50

quentingodeau · 2024-02-16T23:28:02Z

Greetings @samansmink thx for the review!!

samansmink · 2024-02-19T13:47:55Z

Looks great now, thanks again!

quentingodeau force-pushed the feature/performance branch from e5bd770 to bd92fdb Compare February 1, 2024 20:51

quentingodeau force-pushed the feature/performance branch from bd92fdb to 7891cdb Compare February 13, 2024 19:28

samansmink reviewed Feb 16, 2024

View reviewed changes

quentingodeau added 2 commits February 17, 2024 00:03

Avoid try-catch statement for context registration

ec242c0

Add an option to enable/disable the context caching

ba56a50

samansmink merged commit 923ff39 into duckdb:main Feb 19, 2024
18 checks passed

quentingodeau deleted the feature/performance branch February 19, 2024 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Register in the client context a state to avoid reconnecting on Azure. #32

Register in the client context a state to avoid reconnecting on Azure. #32

quentingodeau commented Feb 1, 2024

samansmink commented Feb 1, 2024 •

edited

Loading

quentingodeau commented Feb 1, 2024

quentingodeau commented Feb 8, 2024

quentingodeau commented Feb 13, 2024

samansmink commented Feb 13, 2024

quentingodeau commented Feb 13, 2024 •

edited

Loading

samansmink left a comment

samansmink Feb 16, 2024

quentingodeau Feb 16, 2024

samansmink Feb 16, 2024

quentingodeau Feb 16, 2024

quentingodeau commented Feb 16, 2024

samansmink commented Feb 19, 2024

Register in the client context a state to avoid reconnecting on Azure. #32

Register in the client context a state to avoid reconnecting on Azure. #32

Conversation

quentingodeau commented Feb 1, 2024

samansmink commented Feb 1, 2024 • edited Loading

quentingodeau commented Feb 1, 2024

quentingodeau commented Feb 8, 2024

quentingodeau commented Feb 13, 2024

samansmink commented Feb 13, 2024

quentingodeau commented Feb 13, 2024 • edited Loading

samansmink left a comment

Choose a reason for hiding this comment

samansmink Feb 16, 2024

Choose a reason for hiding this comment

quentingodeau Feb 16, 2024

Choose a reason for hiding this comment

samansmink Feb 16, 2024

Choose a reason for hiding this comment

quentingodeau Feb 16, 2024

Choose a reason for hiding this comment

quentingodeau commented Feb 16, 2024

samansmink commented Feb 19, 2024

samansmink commented Feb 1, 2024 •

edited

Loading

quentingodeau commented Feb 13, 2024 •

edited

Loading