Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add stacker and maybe_grow on recursion guard #1468

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Eason0729
Copy link
Contributor

Changed

under #[cfg(feature="std")], stacker::maybe_grow was called in RecursionCounter::try_decrease to prevent stack overflow

I also tested on Windows(stack size 1MB), it seems fine.

reproducible code from original issue

use datafusion::prelude::{ParquetReadOptions, SessionContext};

#[tokio::main(flavor = "current_thread")]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let ctx = SessionContext::new();
    ctx.register_parquet(
        "parquet_table",
        "batches.parquet",
        ParquetReadOptions::default(),
    )
    .await?;

    let sql_query = "
    SELECT seq
    FROM (
      SELECT seq,
        LEAD(seq) OVER (ORDER BY seq) AS next_seq
        FROM parquet_table
    ) AS subquery
    WHERE next_seq - seq > 1";

    let df = ctx.sql(sql_query).await?;

    df.show().await?;
    Ok(())
}

Missing

  • Return error when growing stack fail:
    growing stack on unsupported target seems to be no-op on stacker side, so I am unable to detect it.
  • Test: which part should be tested?
  • Observe performance impact

@Eason0729
Copy link
Contributor Author

I evaluated stacker as part of sqlparser and I thought it was doing some too crazy stuff that made it hard to use in embedded / wasm environments. Maybe that is better now

Originally posted by @alamb in apache/datafusion#9375 (comment)

I think the use of stacker here is fine under #[cfg(feature="std")], and I totally forget to add stacker as optional dependency 😄 .

@Eason0729
Copy link
Contributor Author

I have add stacker as optional dependency, so it won't be used in no std environment.

*It will continue to stack overflow in this case.

What's your opinion on the use of stacker? @alamb

@alamb
Copy link
Contributor

alamb commented Oct 15, 2024

I have add stacker as optional dependency, so it won't be used in no std environment.

*It will continue to stack overflow in this case.

What's your opinion on the use of stacker? @alamb

I think stacker is a clever hack but we should fix the code for real to avoid deeply nested callstacks rather than hoping that a third-party library will bail us out.

I don't think it is a good idea to use stacker as I don't want to be responsible for trying to debug any issues caused by its use. That may sound selfish, but I am already at my cognitive complexity load for sqlparser (and various other crates)

@Eason0729
Copy link
Contributor Author

That's okay. (I think this PR as a draft to demonstrate what I would like to change

Then I would try to reduce stack usage with another PR.

@Eason0729
Copy link
Contributor Author

Can I mark this PR as draft again or just create a new one?

@iffyio
Copy link
Contributor

iffyio commented Oct 18, 2024

@Eason0729 thanks for looking to improve this behavior! I think should be good to create a new PR (we can link also this PR in the description since they're related)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants