Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Execution time idles for 60 seconds after single-model execution #176

Closed
1 task done
awaschick opened this issue Apr 11, 2023 · 13 comments · Fixed by #204
Closed
1 task done

[Bug]: Execution time idles for 60 seconds after single-model execution #176

awaschick opened this issue Apr 11, 2023 · 13 comments · Fixed by #204
Labels
bug Something isn't working

Comments

@awaschick
Copy link

awaschick commented Apr 11, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When I execute my (very simple) test model from the dbt command line, execution claims to have taken 62 seconds:
image

However, this does not reflect the actual performance of the queries in Dremio:
image

As you can see, the actual query execution time was less than 4 seconds for everything. Then, dbt apparently cooled its heels for 60 seconds until it hit, presumably, some sort of timeout.

This is a new behavior from my experience using the (admittedly very old) previous versions of the dbt-dremio adapter. I understand this new version is making requests in a very different way and I am very happy with all of the new features, but I don't understand why this is happening.

Expected Behavior

I expect the model queries to execute and the job to return with completed status immediately afterward.

Steps To Reproduce

  • Build single dbt model
  • Single thread profile or multi-thread profile doesn't matter
  • Object storage is not $scratch, but local storage
  • Run model against Dremio instance

Environment

- OS: Ubuntu in JupyterHub/Kubernetes
- dbt-dremio: 1.4.5
- Dremio Software: 24.0.0 on-premise
- Dremio Cloud: no

Relevant log output

[15:24:26] jovyan@jupyter-awaschick-40hlkagency-2ecom:~/shared/awaschick/jupyter-automation-qa-delivery-kit/notebooks/dbt/report $ dbt run --vars '{"dremio_database": "_sandbox", "schema_prefix": "hello"}'
15:24:33  Running with dbt=1.4.5
15:24:35  Found 1 model, 0 tests, 0 snapshots, 0 analyses, 333 macros, 0 operations, 0 seed files, 1 source, 0 exposures, 0 metrics
15:24:35  
15:24:38  Concurrency: 2 threads (target='default')
15:24:38  
15:24:38  1 of 1 START sql table model local_nfs_storage_0.dbt-scratch.qa-delivery-kit-dev.hello.hello  [RUN]
15:25:37  1 of 1 OK created sql table model local_nfs_storage_0.dbt-scratch.qa-delivery-kit-dev.hello.hello  [72138 in 59.19s]
15:25:38  
15:25:38  Finished running 1 table model in 0 hours 1 minutes and 2.19 seconds (62.19s).
15:25:38  
15:25:38  Completed successfully
15:25:38  
15:25:38  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
@awaschick awaschick added the bug Something isn't working label Apr 11, 2023
@jlarue26
Copy link
Contributor

@awaschick if there is no sensitive information, are you able to share the dbt.log file from under the project directory?

@awaschick
Copy link
Author

@jlarue26 Certainly sir! I re-ran the model again fresh just now, with similar performance outcome, yielding the log:
dbt.log

My profile for this is nothing out of the ordinary, but in case it is useful:

config:
  partial_parse: false
  use_colors: true
  send_anonymous_usage_stats: false

local_dremio:
  outputs:
    default:
      object_storage_source: local_nfs_storage_0
      object_storage_path: dbt-scratch.qa-delivery-kit-dev.hello
      password:[redacted]
      port: 9047
      software_host: dremio-client.dremio.svc.cluster.local
      threads: 1
      type: dremio
      use_ssl: false
      user: andy
  target: default

@awaschick
Copy link
Author

@jlarue26 Also to confirm, the execution time reported in dbt.log, of 56.08 seconds, does not match the execution time of the job in Dremio:
image
Super strange.

@jlarue26
Copy link
Contributor

Thanks @awaschick , we will take a look. I can't provide a timeframe for resolution yet, but are starting to investigate.

@awaschick
Copy link
Author

awaschick commented Apr 11, 2023

Another interesting thing on this issue: It seems my dbt job is pulling the results for my materialization query back! The server is not idling at all-- checking the master node's logs, I see it fielding a whole ton of requests like this:

10.42.2.30 - - [11/Apr/2023:21:01:04 +0000] "GET /api/v3/job/1bca3583-bd03-25a2-2dbf-84e88e0aab00/results?offset=23801&limit=100 HTTP/1.1" 200 559 "-" "python-requests/2.26.0"

The job, back in dbt, does in fact show a recordcount... but I don't understand why I would need to pull back all the data in a model. In fact for a lot of my models, which have millions of rows, this would be super-not-good. Is there a reason it's doing this? Can I prevent it from doing this?

@awaschick
Copy link
Author

awaschick commented Apr 12, 2023

More insight through irresponsible hacky-cheating: If I make edits to cursor.py in my installed dbt-dremio in dbt/adapters/dremio/api, I can effect the expected performance by commenting out the while expression in _populate_job_results that iterates over all rows in the materialized table, e.g.

            # while current_row_count < total_row_count:
            combined_job_results["rows"].extend(
                job_results(
                    self._parameters,
                    self._job_id,
                    offset=current_row_count,
                    limit=row_limit,
                )["rows"]
            )

Presumably, this hack has all kinds of negative edge cases for operations that do not involve a straight-up table materialization. However, in this use-case, I got precisely the intended results, the job completed in a timely manner, the table materialized fully, the operation ran in 3 seconds.

Is there a way to forego this step for tasks that do not require fully consuming the content of the remote table?

@ravjotbrar
Copy link
Contributor

Hi @awaschick, just out of curiosity, how many columns does your test model have?

@awaschick
Copy link
Author

Hey @ravjotbrar! My test model has 45 columns. It's very basic, it does a select * from an existing materialized table.

@andersbogsnes
Copy link

We're also facing similar issues - the actual job will be done, but DBT will wait for quite some time before returning. Have also experienced timeouts waiting

@Conq1
Copy link

Conq1 commented Jun 16, 2023

As far as I can tell from my own debugging the issue comes from using the run_query macro, which in turn calls the statement macro with fetch_result=True.
By stopping using the run_query macro and using fetch_result_False in my own materializtion I got rid of these wait times, where dbt is basiccaly fetching all the data from the model just created down to my local machine.

Edit: Just had a few times, where this didn't work. From the debug view I could see it still fetching all the data.

@Conq1
Copy link

Conq1 commented Jun 26, 2023

Looks like fetch is actually hardcoded to always be True? Is this a bug?

https://github.com/dremio/dbt-dremio/blob/09cceed52ab5c004811555172fe1208b51ed36dc/dbt/adapters/dremio/connections.py#L179C21-L179C21

@Conq1
Copy link

Conq1 commented Jun 26, 2023

Sorry for the spam. Python is not my native language, so takes some time to dig deep.
Regarding why fetch is hardcoded to True in the Execute method, it kinda makes sense as add_query(), which is called a couple of lines before will always fetch everything. This indicates that maybe some refactoring is necessary to get fetch_result to work as intended. One idea could be to only call _populate_job_results if fetch_results is set to True, as @awaschick mentions.
I do feel though, that the big question is why does the Dremio API give you the created table as part of the result when you run a create table query through the API? When you run the same query in the console you only get a single line response, that the table was created, which I feel is normal SQL behavior.

@Conq1
Copy link

Conq1 commented Jun 30, 2023

Final post and I will shut-up. Thought I would post my fix here.
cursor.py:

def execute(self, sql, bindings=None, fetch=True):
        if self.closed:
            raise Exception("CursorClosed")
        if bindings is None:
            self._initialize()

            json_payload = sql_endpoint(self._parameters, sql, context=None)

            self._job_id = json_payload["id"]

            self._populate_rowcount()
            if fetch:
                self._populate_job_results()
            self._populate_results_table()

        else:
            raise Exception("Bindings not currently supported.")

connection.py:

    def add_query(self, sql, auto_begin=True, bindings=None, abridge_sql_log=False, fetch=True):

        connection = self.get_thread_connection()
        if auto_begin and connection.transaction_open is False:
            self.begin()

        logger.debug(f'Using {self.TYPE} connection "{connection.name}". fetch={fetch}')

        with self.exception_handler(sql):
            if abridge_sql_log:
                logger.debug("On {}: {}....".format(connection.name, sql[0:512]))
            else:
                logger.debug("On {}: {}".format(connection.name, sql))

            pre = time.time()
            cursor = connection.handle.cursor()

            if bindings is None:
                cursor.execute(sql, fetch=fetch)
            else:
                logger.debug(f"Bindings: {bindings}")
                cursor.execute(sql, bindings, fetch=fetch)

            logger.debug(
                "SQL status: {} in {:0.2f} seconds".format(
                    self.get_response(cursor), (time.time() - pre)
                )
            )
            return connection, cursor

connection.py:

    def execute(
        self, sql: str, auto_begin: bool = False, fetch: bool = False
    ) -> Tuple[AdapterResponse, agate.Table]:
        sql = self._add_query_comment(sql)
        _, cursor = self.add_query(sql, auto_begin, fetch=fetch)
        response = self.get_response(cursor)
        # fetch = True
        if fetch:
            table = cursor.table
        else:
            table = dbt.clients.agate_helper.empty_table()

        return response, table

@ravjotbrar ravjotbrar linked a pull request Sep 6, 2023 that will close this issue
1 task
ravjotbrar added a commit that referenced this issue Nov 14, 2023
### Summary

Before this change, execution time was idling even after the job was
finished in Dremio. This was happening because the adapter would fetch
unnecessary data from the materialized model.

### Description

This change makes it so the adapter only fetches data from the
materialized model if fetch is set to true.

### Test Results

All tests pass

### Changelog

-   [x] Added a summary of what this PR accomplishes to CHANGELOG.md

### Related Issue

#176
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

Successfully merging a pull request may close this issue.

5 participants