-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for ingesting/synthesizing custom binary data file #15
Comments
I just talked to a former colleague who works in statistical data processing. For interop reasons they work with binary files containing the data as fixed-width rows of little-endian 16-bit integers; sometimes 32 bit integers for larger value ranges. They could also make use of such a feature. |
@fretz12 thanks for this. This is a really interesting use case that requires some additional core features to be introduced to synth. Some notes:
|
@christoshadjiaslanis - Regarding 2, I'm only needing the synthesized data to be written to a file. Bear with me if I'm making naive suggestions... but I'm thinking there would be a Though I think our binary format is fairly simplistic, binary formats in general can be wildly varying. One thought is that for hard to customize things like SerDes, perhaps using a plugin interface where the user can write their own (de)serializers and make a .so or .dylib out of it, and synth would load those dynamic libs to execute serdes, with APIs binding binary -> |
@llogiq - thx! |
Let's try that again: @all-contributors please add @fretz12 for awesome ideas. |
I've put up a pull request to add @fretz12! 🎉 |
@fretz12 yeah I think that for binary serializers we need user-defined serializers. It's not a fully formed thought yet, but roughly speaking we have our existing schema which defines how data is generated, and a second piece of config which needs to dictate how that data is mapped to a binary serialization format. I'm not sure how this would work exactly. Perhaps we can create an RFC for this and try to design something that makes sense. |
Required Functionality
While binary data can come in many shapes and forms, the particular format I'm after is unencoded/uncompressed binary data that have different fields packed next to each other. Additionally, the file begins with a header, and is concluded by a footer. In the middle, is the payload data, where entries are repeated many times.
Here is a pictorial of such a format:
Each entry is of fixed size, and can have multiple fields of different data types occupying a different amount of bytes. Example:
Proposed Solution
The user will be required to supply additional schema info to tell synth how to parse the fields. A possible format may look something like this:
Such a binary schema can also be used to define extensions in the future, like encoding, var-length data etc.
Synth should be able to take such a schema and data file, infer from it, and output a variant of the fields. A nice to have would be to take the original data file's header and footer, and stuff it into the generated file as is.
Use case
The use case pertains to protocol data files used in the storage industry. NVMe is one example. Other storage and networking protocols typically follow such a format to some degree, as well.
The text was updated successfully, but these errors were encountered: