Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique type definitions can consume all memory #4894

Open
philrz opened this issue Nov 22, 2023 · 0 comments
Open

Unique type definitions can consume all memory #4894

philrz opened this issue Nov 22, 2023 · 0 comments

Comments

@philrz
Copy link
Contributor

philrz commented Nov 22, 2023

At the time this issue is being opened Zed is at commit 1ec7052.

@mattnibs recently pointed out that the kinds of changes in #4555 make it particularly easy to make Zed consume all available memory by creating lots of unique type definitions. For an easy repro, consider the following script.

$ cat manytypes.py 
#!/usr/bin/env python3

num=1
while True:
  print('{"' + str(num) + '": ' + str(num) + '}')
  num += 1

Run on an AWS t2.xlarge (16 GB of memory), it gets past 19-million values before all memory is consumed and then the system hangs.

$ ./manytypes.sh | zq -z -
...
{"19198499":19198499}
{"19198500":19198500}
{"19198501":19198501}

We can certainly document this as a known limitation to encourage users to structure their data in ways that won't bump into this (e.g., use a Zed "map" type). However, this kind of data is legal in formats like JSON, and I think Zed currently needs to be able to read such JSON data in full to turn it into a map. Also,jq don't have this same limitation (in a test I observed its memory usage at a flat 872 KB to reach this same point), which is unsurprising given its approach to "stateless dataflow". Therefore it might be worth finding a way to tolerate this kind of input and/or fail more gracefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant