[Bug]: Execution time idles for 60 seconds after single-model execution #176

awaschick · 2023-04-11T15:32:54Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

When I execute my (very simple) test model from the dbt command line, execution claims to have taken 62 seconds:

However, this does not reflect the actual performance of the queries in Dremio:

As you can see, the actual query execution time was less than 4 seconds for everything. Then, dbt apparently cooled its heels for 60 seconds until it hit, presumably, some sort of timeout.

This is a new behavior from my experience using the (admittedly very old) previous versions of the dbt-dremio adapter. I understand this new version is making requests in a very different way and I am very happy with all of the new features, but I don't understand why this is happening.

Expected Behavior

I expect the model queries to execute and the job to return with completed status immediately afterward.

Steps To Reproduce

Build single dbt model
Single thread profile or multi-thread profile doesn't matter
Object storage is not $scratch, but local storage
Run model against Dremio instance

Environment

- OS: Ubuntu in JupyterHub/Kubernetes
- dbt-dremio: 1.4.5
- Dremio Software: 24.0.0 on-premise
- Dremio Cloud: no

Relevant log output

[15:24:26] jovyan@jupyter-awaschick-40hlkagency-2ecom:~/shared/awaschick/jupyter-automation-qa-delivery-kit/notebooks/dbt/report $ dbt run --vars '{"dremio_database": "_sandbox", "schema_prefix": "hello"}'
15:24:33  Running with dbt=1.4.5
15:24:35  Found 1 model, 0 tests, 0 snapshots, 0 analyses, 333 macros, 0 operations, 0 seed files, 1 source, 0 exposures, 0 metrics
15:24:35  
15:24:38  Concurrency: 2 threads (target='default')
15:24:38  
15:24:38  1 of 1 START sql table model local_nfs_storage_0.dbt-scratch.qa-delivery-kit-dev.hello.hello  [RUN]
15:25:37  1 of 1 OK created sql table model local_nfs_storage_0.dbt-scratch.qa-delivery-kit-dev.hello.hello  [72138 in 59.19s]
15:25:38  
15:25:38  Finished running 1 table model in 0 hours 1 minutes and 2.19 seconds (62.19s).
15:25:38  
15:25:38  Completed successfully
15:25:38  
15:25:38  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

The text was updated successfully, but these errors were encountered:

jlarue26 · 2023-04-11T19:10:47Z

@awaschick if there is no sensitive information, are you able to share the dbt.log file from under the project directory?

awaschick · 2023-04-11T19:41:01Z

@jlarue26 Certainly sir! I re-ran the model again fresh just now, with similar performance outcome, yielding the log:
dbt.log

My profile for this is nothing out of the ordinary, but in case it is useful:

config:
  partial_parse: false
  use_colors: true
  send_anonymous_usage_stats: false

local_dremio:
  outputs:
    default:
      object_storage_source: local_nfs_storage_0
      object_storage_path: dbt-scratch.qa-delivery-kit-dev.hello
      password:[redacted]
      port: 9047
      software_host: dremio-client.dremio.svc.cluster.local
      threads: 1
      type: dremio
      use_ssl: false
      user: andy
  target: default

awaschick · 2023-04-11T19:44:10Z

@jlarue26 Also to confirm, the execution time reported in dbt.log, of 56.08 seconds, does not match the execution time of the job in Dremio:

Super strange.

jlarue26 · 2023-04-11T19:45:15Z

Thanks @awaschick , we will take a look. I can't provide a timeframe for resolution yet, but are starting to investigate.

awaschick · 2023-04-11T21:06:21Z

Another interesting thing on this issue: It seems my dbt job is pulling the results for my materialization query back! The server is not idling at all-- checking the master node's logs, I see it fielding a whole ton of requests like this:

10.42.2.30 - - [11/Apr/2023:21:01:04 +0000] "GET /api/v3/job/1bca3583-bd03-25a2-2dbf-84e88e0aab00/results?offset=23801&limit=100 HTTP/1.1" 200 559 "-" "python-requests/2.26.0"

The job, back in dbt, does in fact show a recordcount... but I don't understand why I would need to pull back all the data in a model. In fact for a lot of my models, which have millions of rows, this would be super-not-good. Is there a reason it's doing this? Can I prevent it from doing this?

awaschick · 2023-04-12T15:26:46Z

More insight through irresponsible hacky-cheating: If I make edits to cursor.py in my installed dbt-dremio in dbt/adapters/dremio/api, I can effect the expected performance by commenting out the while expression in _populate_job_results that iterates over all rows in the materialized table, e.g.

            # while current_row_count < total_row_count:
            combined_job_results["rows"].extend(
                job_results(
                    self._parameters,
                    self._job_id,
                    offset=current_row_count,
                    limit=row_limit,
                )["rows"]
            )

Presumably, this hack has all kinds of negative edge cases for operations that do not involve a straight-up table materialization. However, in this use-case, I got precisely the intended results, the job completed in a timely manner, the table materialized fully, the operation ran in 3 seconds.

Is there a way to forego this step for tasks that do not require fully consuming the content of the remote table?

ravjotbrar · 2023-04-17T15:46:27Z

Hi @awaschick, just out of curiosity, how many columns does your test model have?

awaschick · 2023-04-17T18:26:22Z

Hey @ravjotbrar! My test model has 45 columns. It's very basic, it does a select * from an existing materialized table.

andersbogsnes · 2023-05-31T11:06:45Z

We're also facing similar issues - the actual job will be done, but DBT will wait for quite some time before returning. Have also experienced timeouts waiting

Conq1 · 2023-06-16T12:41:55Z

As far as I can tell from my own debugging the issue comes from using the run_query macro, which in turn calls the statement macro with fetch_result=True.
By stopping using the run_query macro and using fetch_result_False in my own materializtion I got rid of these wait times, where dbt is basiccaly fetching all the data from the model just created down to my local machine.

Edit: Just had a few times, where this didn't work. From the debug view I could see it still fetching all the data.

Conq1 · 2023-06-26T07:45:17Z

Looks like fetch is actually hardcoded to always be True? Is this a bug?

https://github.com/dremio/dbt-dremio/blob/09cceed52ab5c004811555172fe1208b51ed36dc/dbt/adapters/dremio/connections.py#L179C21-L179C21

Conq1 · 2023-06-26T13:47:30Z

Sorry for the spam. Python is not my native language, so takes some time to dig deep.
Regarding why fetch is hardcoded to True in the Execute method, it kinda makes sense as add_query(), which is called a couple of lines before will always fetch everything. This indicates that maybe some refactoring is necessary to get fetch_result to work as intended. One idea could be to only call _populate_job_results if fetch_results is set to True, as @awaschick mentions.
I do feel though, that the big question is why does the Dremio API give you the created table as part of the result when you run a create table query through the API? When you run the same query in the console you only get a single line response, that the table was created, which I feel is normal SQL behavior.

Conq1 · 2023-06-30T11:09:36Z

Final post and I will shut-up. Thought I would post my fix here.
cursor.py:

def execute(self, sql, bindings=None, fetch=True):
        if self.closed:
            raise Exception("CursorClosed")
        if bindings is None:
            self._initialize()

            json_payload = sql_endpoint(self._parameters, sql, context=None)

            self._job_id = json_payload["id"]

            self._populate_rowcount()
            if fetch:
                self._populate_job_results()
            self._populate_results_table()

        else:
            raise Exception("Bindings not currently supported.")

connection.py:

    def add_query(self, sql, auto_begin=True, bindings=None, abridge_sql_log=False, fetch=True):

        connection = self.get_thread_connection()
        if auto_begin and connection.transaction_open is False:
            self.begin()

        logger.debug(f'Using {self.TYPE} connection "{connection.name}". fetch={fetch}')

        with self.exception_handler(sql):
            if abridge_sql_log:
                logger.debug("On {}: {}....".format(connection.name, sql[0:512]))
            else:
                logger.debug("On {}: {}".format(connection.name, sql))

            pre = time.time()
            cursor = connection.handle.cursor()

            if bindings is None:
                cursor.execute(sql, fetch=fetch)
            else:
                logger.debug(f"Bindings: {bindings}")
                cursor.execute(sql, bindings, fetch=fetch)

            logger.debug(
                "SQL status: {} in {:0.2f} seconds".format(
                    self.get_response(cursor), (time.time() - pre)
                )
            )
            return connection, cursor

connection.py:

    def execute(
        self, sql: str, auto_begin: bool = False, fetch: bool = False
    ) -> Tuple[AdapterResponse, agate.Table]:
        sql = self._add_query_comment(sql)
        _, cursor = self.add_query(sql, auto_begin, fetch=fetch)
        response = self.get_response(cursor)
        # fetch = True
        if fetch:
            table = cursor.table
        else:
            table = dbt.clients.agate_helper.empty_table()

        return response, table

### Summary Before this change, execution time was idling even after the job was finished in Dremio. This was happening because the adapter would fetch unnecessary data from the materialized model. ### Description This change makes it so the adapter only fetches data from the materialized model if fetch is set to true. ### Test Results All tests pass ### Changelog - [x] Added a summary of what this PR accomplishes to CHANGELOG.md ### Related Issue #176

awaschick added the bug Something isn't working label Apr 11, 2023

Conq1 mentioned this issue Jul 9, 2023

[Bug]: "Can not create a folder inside a [SOURCE]" when writing table into object storage #195

Closed

1 task

ravjotbrar mentioned this issue Sep 6, 2023

Make fetch configurable and fix hook tests #204

Merged

1 task

ravjotbrar linked a pull request Sep 6, 2023 that will close this issue

Make fetch configurable and fix hook tests #204

Merged

1 task

ravjotbrar closed this as completed in #204 Nov 14, 2023

mxmarg mentioned this issue Jan 19, 2024

disabled local data fetch for CTAS & limited local data fetch to 100k… #212

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Execution time idles for 60 seconds after single-model execution #176

[Bug]: Execution time idles for 60 seconds after single-model execution #176

awaschick commented Apr 11, 2023 •

edited

Loading

jlarue26 commented Apr 11, 2023

awaschick commented Apr 11, 2023

awaschick commented Apr 11, 2023

jlarue26 commented Apr 11, 2023

awaschick commented Apr 11, 2023 •

edited

Loading

awaschick commented Apr 12, 2023 •

edited

Loading

ravjotbrar commented Apr 17, 2023

awaschick commented Apr 17, 2023

andersbogsnes commented May 31, 2023

Conq1 commented Jun 16, 2023 •

edited

Loading

Conq1 commented Jun 26, 2023

Conq1 commented Jun 26, 2023

Conq1 commented Jun 30, 2023 •

edited

Loading

[Bug]: Execution time idles for 60 seconds after single-model execution #176

[Bug]: Execution time idles for 60 seconds after single-model execution #176

Comments

awaschick commented Apr 11, 2023 • edited Loading

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Relevant log output

jlarue26 commented Apr 11, 2023

awaschick commented Apr 11, 2023

awaschick commented Apr 11, 2023

jlarue26 commented Apr 11, 2023

awaschick commented Apr 11, 2023 • edited Loading

awaschick commented Apr 12, 2023 • edited Loading

ravjotbrar commented Apr 17, 2023

awaschick commented Apr 17, 2023

andersbogsnes commented May 31, 2023

Conq1 commented Jun 16, 2023 • edited Loading

Conq1 commented Jun 26, 2023

Conq1 commented Jun 26, 2023

Conq1 commented Jun 30, 2023 • edited Loading

awaschick commented Apr 11, 2023 •

edited

Loading

awaschick commented Apr 11, 2023 •

edited

Loading

awaschick commented Apr 12, 2023 •

edited

Loading

Conq1 commented Jun 16, 2023 •

edited

Loading

Conq1 commented Jun 30, 2023 •

edited

Loading