-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack overflow on Windows #2857
Comments
Ah, great discovery! Does |
No, it seems working. ❯ ./prqlc.exe parse
Enter PRQL, then press ctrl-z to compile:
from employees
filter gross_cost > 0
take 20
sort ct
filter ct > 200
sort ct
take 20
filter ct > 200
sort ct
filter ct > 200
sort ct
filter ct > 200
sources:
'':
- name: main
VarDef:
value:
Pipeline:
exprs:
- FuncCall:
name:
Ident:
- from
args:
- Ident:
- employees
- FuncCall:
name:
Ident:
- filter
args:
- Binary:
left:
Ident:
- gross_cost
op: Gt
right:
Literal: !Integer 0
- FuncCall:
name:
Ident:
- take
args:
- Literal: !Integer 20
- FuncCall:
name:
Ident:
- sort
args:
- Ident:
- ct
- FuncCall:
name:
Ident:
- filter
args:
- Binary:
left:
Ident:
- ct
op: Gt
right:
Literal: !Integer 200
- FuncCall:
name:
Ident:
- sort
args:
- Ident:
- ct
- FuncCall:
name:
Ident:
- take
args:
- Literal: !Integer 20
- FuncCall:
name:
Ident:
- filter
args:
- Binary:
left:
Ident:
- ct
op: Gt
right:
Literal: !Integer 200
- FuncCall:
name:
Ident:
- sort
args:
- Ident:
- ct
- FuncCall:
name:
Ident:
- filter
args:
- Binary:
left:
Ident:
- ct
op: Gt
right:
Literal: !Integer 200
- FuncCall:
name:
Ident:
- sort
args:
- Ident:
- ct
- FuncCall:
name:
Ident:
- filter
args:
- Binary:
left:
Ident:
- ct
op: Gt
right:
Literal: !Integer 200
ty_expr: null
kind: Main
annotations: []
source_ids:
0: '' ❯ ./prqlc.exe compile
Enter PRQL, then press ctrl-z to compile:
from employees
filter gross_cost > 0
take 20
sort ct
filter ct > 200
sort ct
take 20
filter ct > 200
sort ct
filter ct > 200
sort ct
filter ct > 200
thread 'main' has overflowed its stack |
Oh, weird, thanks. So it's likely in "our" code rather than Chumsky. If you have the patience (no problem if you're busy; I don't have access to a Windows machine unfortunately), would you mind running with And if that's unclear, we could see how far we can get before overflowing — for example whether |
Sure. ❯ $env:RUST_LOG='trace'
❯ $env:RUST_BACKTRACE='full'
❯ ./prqlc.exe compile
Enter PRQL, then press ctrl-z to compile:
from employees
filter gross_cost > 0
take 20
sort ct
filter ct > 200
sort ct
take 20
filter ct > 200
sort ct
filter ct > 200
sort ct
filter ct > 200
Also does not work. |
Thanks! It looks like it's failing somewhere in the resolver; the final trace is at prql/prql-compiler/src/semantic/resolver.rs Line 199 in b129914
I wouldn't have thought that there are enough items on the stack at that point to get close to overflow. But it also presumably can't enter an infinite loop given it on overflows on Windows... |
Great find @eitsupi ! That seems likely. So I guess one of our structs is huge (and this isn't fortran!). So it's about the size of objects on the stack, rather than having a very long call-chain. I'll have a look at what this might be (unless @aljazerzen sees this and have a good idea what this is likely to be). And then probably we can just put that thing in a |
I'm not sure if this is fixed, can someone on Windows verify? |
As I wrote here (#2878 (comment)), I tried several queries. Is there anything else I can do? |
I don't know how to debug this. If you are comfortable with Rust and traits, you I have a few ideas that might work, but require tweaking the parser. Which is always fun \s. |
Thanks for trying. I hadn't realized that was with the current version.
One thing we could do @eitsupi is to add these as tests, to (I can do that if you don't get to it) That said, I do think this will be difficult without the unification of a Windows box and @aljazerzen 's mind! I can have a go though, maybe bring a fresh perspective... We can also rent Windows VMs — the cost is the setup rather than the $ cost |
Oh, I do have a windows box, but I'm not willing to pay the setup cost. And also I have ideological objections to using windows :D |
As I wrote #2894 (comment), it seems still failing. |
This problem seems to still occur, but has definitely been improved by #2878. I have confirmed that this query, where this problem was discovered, also compiles correctly. |
I did some quick diagnostics. Using this diff from the current diff --git a/prql-compiler/prqlc/tests/project/Project.prql b/prql-compiler/prqlc/tests/project/Project.prql
index 1fc6a97f..1b44377f 100644
--- a/prql-compiler/prqlc/tests/project/Project.prql
+++ b/prql-compiler/prqlc/tests/project/Project.prql
@@ -6,3 +6,4 @@ let favorite_artists = [
favorite_artists
join artists.input side:left (==artist_id)
+join long_query.long_query side:left (input.title == long_query.title)
diff --git a/prql-compiler/prqlc/tests/project/long_query.prql b/prql-compiler/prqlc/tests/project/long_query.prql
new file mode 100644
index 00000000..9f64f091
--- /dev/null
+++ b/prql-compiler/prqlc/tests/project/long_query.prql
@@ -0,0 +1,25 @@
+# TODO: move this into a separate test; but putting here because of the
+# `insta_cmd` issues, and because we're in a rush to ensure Windows builds
+# aren't randomly breaking.
+
+let long_query = (
+ from employees
+ filter gross_cost > 0
+ group {title} (
+ aggregate {
+ ct = count s"*",
+ }
+ )
+ sort ct
+ filter ct > 200
+ take 20
+ sort ct
+ filter ct > 200
+ take 20
+ sort ct
+ filter ct > 200
+ take 20
+ sort ct
+ filter ct > 200
+ take 20
+)
diff --git a/prql-compiler/prqlc/tests/test.rs b/prql-compiler/prqlc/tests/test.rs
index d495b301..37c65cd1 100644
--- a/prql-compiler/prqlc/tests/test.rs
+++ b/prql-compiler/prqlc/tests/test.rs
@@ -78,77 +78,7 @@ fn test_compile_project() {
cmd.arg(project_path());
cmd.arg("-");
cmd.arg("main");
- assert_cmd_snapshot!(cmd, @r###"
- success: true
- exit_code: 0
- ----- stdout -----
- WITH table_1 AS (
- SELECT
- 120 AS artist_id,
- DATE '2023-05-18' AS last_listen
- UNION
- ALL
- SELECT
- 7 AS artist_id,
- DATE '2023-05-16' AS last_listen
- ),
- favorite_artists AS (
- SELECT
- artist_id,
- last_listen
- FROM
- table_1
- ),
- table_0 AS (
- SELECT
- *
- FROM
- read_parquet('artists.parquet')
- ),
- input AS (
- SELECT
- *
- FROM
- table_0
- )
- SELECT
- favorite_artists.artist_id,
- favorite_artists.last_listen,
- input.*
- FROM
- favorite_artists
- LEFT JOIN input ON favorite_artists.artist_id = input.artist_id
-
- ----- stderr -----
- "###);
-
- let mut cmd = prqlc_command();
- cmd.args(["compile", "--hide-signature-comment"]);
- cmd.arg(project_path());
- cmd.arg("-");
- cmd.arg("favorite_artists");
- assert_cmd_snapshot!(cmd, @r###"
- success: true
- exit_code: 0
- ----- stdout -----
- WITH table_0 AS (
- SELECT
- 120 AS artist_id,
- DATE '2023-05-18' AS last_listen
- UNION
- ALL
- SELECT
- 7 AS artist_id,
- DATE '2023-05-16' AS last_listen
- )
- SELECT
- artist_id,
- last_listen
- FROM
- table_0
-
- ----- stderr -----
- "###);
+ assert_cmd_snapshot!(cmd);
}
#[test]
@@ -162,7 +92,7 @@ fn test_format() {
----- stdout -----
----- stderr -----
- Currently `fmt` only works with a single source, but found multiple sources: "`Project.prql`, `artists.prql`"
+ Currently `fmt` only works with a single source, but found multiple sources: "`Project.prql`, `artists.prql`, `long_query.prql`"
"###);
}
@@ -189,8 +119,8 @@ fn prqlc_command() -> Command {
// `--color=never` flag.
cmd.env_remove("CLICOLOR_FORCE");
// We don't want the tests to be affected by the user's `RUST_BACKTRACE` setting.
- cmd.env_remove("RUST_BACKTRACE");
- cmd.env_remove("RUST_LOG");
+ // cmd.env_remove("RUST_BACKTRACE");
+ // cmd.env_remove("RUST_LOG");
cmd.args(["--color=never"]);
cmd
}
diff --git a/prql-compiler/src/lib.rs b/prql-compiler/src/lib.rs
index dacfd237..aeb270df 100644
--- a/prql-compiler/src/lib.rs
+++ b/prql-compiler/src/lib.rs
@@ -65,7 +65,7 @@
//! $ prqlc compile query.prql
//! ```
//!
-
+#![feature(backtrace_frames)]
#![forbid(unsafe_code)]
// Our error type is 128 bytes, because it contains 5 strings & an Enum, which
// is exactly the default warning level. Given we're not that performance
diff --git a/prql-compiler/src/semantic/module.rs b/prql-compiler/src/semantic/module.rs
index 97b08a17..b86c7693 100644
--- a/prql-compiler/src/semantic/module.rs
+++ b/prql-compiler/src/semantic/module.rs
@@ -1,4 +1,7 @@
-use std::collections::{HashMap, HashSet};
+use std::{
+ backtrace,
+ collections::{HashMap, HashSet},
+};
use anyhow::{bail, Result};
use itertools::Itertools;
@@ -201,6 +204,12 @@ fn lookup_in(module: &Module, ident: Ident) -> HashSet<Ident> {
}
log::trace!("lookup: {ident}");
+ let bt = backtrace::Backtrace::capture();
+ let len = bt.frames().len();
+ log::trace!("{}", len);
+ if len >= 120 {
+ log::trace!("{:?}", bt);
+ }
let mut res = HashSet::new();
And then running with:
We can then look at how deep the stack traces are with:
I then set some bound at which to log the whole stack trace (currently 120 deep), and then looked at one:
It looks like the resolver goes very deep. Is that because it structures ...rather than working like an intepreter, evaluating If so — that's an elegant approach, but maybe it scales badly with program size? I need to sign off for a bit, and probably won't be doing much coding in the next couple of days. I can look at this more after that though! |
One way of emulating this on a Mac — run At |
FYI this was fixed and then started reoccurring again after #3786. I'll try and open a PR with the test reenabled to track if it's still an issue |
@aljazerzen I've been thinking about this some more. Is this a fundamental limitation with our compiling approach? Or is there some way of getting around it?
|
I doubt this is a fundamantally bad approach - I've parsed enormous PRQL files with a relatively small stack size, so I don't think it applies to Linux too. |
OK interesting. Though FYI am guessing this is the case for Linux too:
...in which case this is a magnitude rather than a categorical issue. (only in dev mode though, not with |
1MB of stack memory is not that much. When I was trying this out and have limited stack size on Linux, prqlc started failing in resolver, so I concluded it is not a problem at all. I was using a test file where I just added a bunch of |
There is only one solution to this problem: install Windows somewhere and debug it. Until we try that there is no point in trying other solutions. |
IIUC Windows has a 1-2MB default stack size, which is why we get this problem on Windows or when reducing the stack size on Mac...
Yes, we have this test which does something similar: https://github.com/PRQL/prql/pull/4073/files
If I set a similar stack size to Windows' I get the same error, described here: #2857 (comment). And that shows that we're 120 frames deep when overflowing the stack, mostly from the resolver. I don't know if that's bad per se, but it does seem to be the cause of the issue. Isn't that a pretty good test? Or there's a case for it being different on Windows? |
I don't know if it is, because we don't know what's really the problem. I've setup a Windows VM, downloaded git, rust toolchain and prql, compiled prqlc and parsed the test file. I'm not able to reproduce this error.
Maybe it is a problem of available memory size? |
!!
Does |
The test does fail. Which is strange, so I re-ran it from the command line with A few observations:
|
So stack memory usage of our compiler grows linearly with call depth of the PRQL program. Which is not ideal and but would be hard to change. We can change how fast does it grow with call depth and debugging a little, it looks like we do have a bunch of variables on the stack that could be on the heap. Will optimize a little. |
PS: the stack overflows 658 calls deep (with |
Interesting; this is a bit different from the old location Edit: updated in #4104 |
A slicker version of PRQL#2857 (comment) in case that's helpful. Run with: ``` RUST_BACKTRACE=1 RUST_LOG=trace cargo +nightly test -p prqlc --lib -- long_query_inline --nocapture ``` ...may want to redirect to a file since it prints a lot...
It depends on exactly when the stack is filled, so it will vary with every new variable in a function that requires stack memory. |
See #2700 (comment) and #2713 and #2781 (comment)
I tried a few simple examples with the latest prqlc.exe built from main (8f9b778), and I suspect that this has nothing to do with
count
and everything to do with the number offilter
andtake
andsort
.Working
Not working
Originally posted by @eitsupi in #2717 (comment)
The text was updated successfully, but these errors were encountered: