-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Execution time idles for 60 seconds after single-model execution #176
Comments
@awaschick if there is no sensitive information, are you able to share the dbt.log file from under the project directory? |
@jlarue26 Certainly sir! I re-ran the model again fresh just now, with similar performance outcome, yielding the log: My profile for this is nothing out of the ordinary, but in case it is useful:
|
@jlarue26 Also to confirm, the execution time reported in |
Thanks @awaschick , we will take a look. I can't provide a timeframe for resolution yet, but are starting to investigate. |
Another interesting thing on this issue: It seems my dbt job is pulling the results for my materialization query back! The server is not idling at all-- checking the master node's logs, I see it fielding a whole ton of requests like this:
The job, back in dbt, does in fact show a recordcount... but I don't understand why I would need to pull back all the data in a model. In fact for a lot of my models, which have millions of rows, this would be super-not-good. Is there a reason it's doing this? Can I prevent it from doing this? |
More insight through irresponsible hacky-cheating: If I make edits to # while current_row_count < total_row_count:
combined_job_results["rows"].extend(
job_results(
self._parameters,
self._job_id,
offset=current_row_count,
limit=row_limit,
)["rows"]
) Presumably, this hack has all kinds of negative edge cases for operations that do not involve a straight-up table materialization. However, in this use-case, I got precisely the intended results, the job completed in a timely manner, the table materialized fully, the operation ran in 3 seconds. Is there a way to forego this step for tasks that do not require fully consuming the content of the remote table? |
Hi @awaschick, just out of curiosity, how many columns does your test model have? |
Hey @ravjotbrar! My test model has 45 columns. It's very basic, it does a |
We're also facing similar issues - the actual job will be done, but DBT will wait for quite some time before returning. Have also experienced timeouts waiting |
As far as I can tell from my own debugging the issue comes from using the run_query macro, which in turn calls the statement macro with fetch_result=True. Edit: Just had a few times, where this didn't work. From the debug view I could see it still fetching all the data. |
Looks like fetch is actually hardcoded to always be True? Is this a bug? |
Sorry for the spam. Python is not my native language, so takes some time to dig deep. |
Final post and I will shut-up. Thought I would post my fix here.
connection.py:
connection.py:
|
### Summary Before this change, execution time was idling even after the job was finished in Dremio. This was happening because the adapter would fetch unnecessary data from the materialized model. ### Description This change makes it so the adapter only fetches data from the materialized model if fetch is set to true. ### Test Results All tests pass ### Changelog - [x] Added a summary of what this PR accomplishes to CHANGELOG.md ### Related Issue #176
Is there an existing issue for this?
Current Behavior
When I execute my (very simple) test model from the dbt command line, execution claims to have taken 62 seconds:
However, this does not reflect the actual performance of the queries in Dremio:
As you can see, the actual query execution time was less than 4 seconds for everything. Then, dbt apparently cooled its heels for 60 seconds until it hit, presumably, some sort of timeout.
This is a new behavior from my experience using the (admittedly very old) previous versions of the dbt-dremio adapter. I understand this new version is making requests in a very different way and I am very happy with all of the new features, but I don't understand why this is happening.
Expected Behavior
I expect the model queries to execute and the job to return with completed status immediately afterward.
Steps To Reproduce
$scratch
, but local storageEnvironment
Relevant log output
The text was updated successfully, but these errors were encountered: