-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DATA-928] Fetch tabular data in CLI via multiple requests #2210
Conversation
cli/data.go
Outdated
if d == nil { | ||
continue | ||
|
||
fmt.Print("Downloading..") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this since this isn't parallelized/optimized, so useful to have some indication that there's progress.
break | ||
} | ||
for _, md := range mds { | ||
mdJSONBytes, err := protojson.Marshal(md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this could repeat metadata, e.g. if all data came from the same capture but is split across multiple requests. What happens is that we unnecessarily create additional metadata files, but the actual data only points to metadata_idx_0. Leaving as-is for now, since this is an issue between both binary and tabular data, and becomes more complicated with parallelization.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Since the backend sets a default limit of 50 if not passed, the TabularDataByFilter call was actually only returning a subset of documents. This fixes that bug and batches requests, since otherwise a large request will time out.
Testing
Tested locally with ~150K tabular datapoints and ensured they were all written into the ndjson with corresponding metadata.