-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream to Seq #131
Stream to Seq #131
Conversation
See #129. |
That's great. The only problem I see is with stdcompat. After seing how it is built and maintained (autotools, no dune, which creates issues for us using it in the duniverse and is also kinda a large amount of code to depend on) I would much prefer a different solution. Given we infamously use CPPO we could have our own Seq module for older versions of OCaml or something. |
Actually I think I could use c-cube/seq instead, I can update the PR, but only later in the day (or feel free to do it if you want). |
That's a good point, @c-cube's |
CHANGES.md
Outdated
@@ -25,6 +25,8 @@ | |||
|
|||
- The function `to_file` now adds a newline at the end of the generated file. An | |||
optional argument allows to return to the original behaviour (#124, @panglesd) | |||
- The stream_from_* and stream_to_* functions now use a Seq.t instead of a Stream.t, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The stream_from_* and stream_to_* functions now use a Seq.t instead of a Stream.t, | |
- The `stream_from_*` and `stream_to_*` functions now use a Seq.t instead of a Stream.t, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, also quoted the other identifiers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, reopening this since the force-push undid my change.
Something I'm wondering about is whether it's worth it trying to make the seq persistent. It's not too hard to do, for a small cost compared to json decoding, and it can be convenient if one uses the let rec stream_to_seq ic : t Seq.t =
let r = ref None in
fun () ->
match !r with
| Some node -> node
| None ->
let node =
try
let json = decode_json ic in
Seq.Cons (j, stream_to_seq ic)
with End_of_file -> Seq.Nil
in
r := Some node;
node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some things I have encountered while going through the code.
5d7091c
to
fa8e5f2
Compare
@c-cube I tried to implement your suggestion, it's a bit more work because we need to handle exceptions as well. I got the following version which is 2x slower and allocates 3x more. type 'a cache =
| Empty
| Full of 'a
| Fail of exn
let seq_from_lexbuf v ?(fin = fun () -> ()) lexbuf =
let stream = Some true in
let rec f lexbuf =
let cache = ref Empty in
fun () ->
match !cache with
| Full node -> node
| Fail e -> raise e
| Empty ->
let node =
try Seq.Cons (from_lexbuf v ?stream lexbuf, f lexbuf)
with
| End_of_input ->
fin ();
Seq.Nil
| e ->
let exn =
try ignore (fin ()); e
with fin_e -> Finally (e, fin_e)
in
cache := Fail exn;
raise exn
in
cache := Full node;
node
in
f lexbuf I think it's simpler to stay with the non-persistent version, and let users convert it explicitly to a persistent version if they need to. (I expect that they rarely do.) Note: (even before @c-cube chimed in,) I was wondering whether one of containers' generator/iterator types (those with type |
@gasche I'm surprised by the overhead, but so be it :-). I think it's indeed valuable to use |
(Note: I'm not working from a flambda switch.) |
Most people aren't, so I think it's reasonable 😁 |
By coincidence I stumbled on this PR, I was looking into what streaming functions yojson had to offer and because this PR was just opened, I wanted to try it. Now I have an issue with If I run the exact same code but with Any idea if I'm doing something wrong, or if something else is going on? |
Here is a minimal (I'm using base) that fails: let test () =
let seq = Sequence.of_list [`String "a"; `String "b"; `String "c"] |> Sequence.to_seq in
Yojson.Basic.seq_to_string seq |> Stdio.print_endline;
Yojson.Basic.seq_to_file "test.json" seq This prints "a""b""c" on terminal, but "test.json" is empty. |
This does look like a bug indeed, that (according to my test) was already present in the |
lib/write.ml
Outdated
let ob = | ||
match buf with | ||
None -> Buffer.create len | ||
| Some ob -> ob | ||
in | ||
stream_to_buffer ?std ob st | ||
seq_to_buffer ?std ob st |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is missing a line to actually write the buffer Buffer.output_buffer oc ob
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I just found the same fix. I think it should be fixed separately from this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is okay to add this to this PR, but it is up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fix should be the suggestion made by @gasche in the comments, otherwise the seq
has no real value as there is no memory improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm planning to send a separate PR with the fix, and probably the streaming improvement as well, but probably not before tomorrow. I think the current PR should stay focused on just replacing Streaming with Seq, preserving the behavior exactly.
There are other things that look fishy in the buffer-handling logic of write.mll
(I suspect that it could sometimes include the existing data of the buffer in the output, which is not intended), so it would be nice to review that as well, maybe I'll do it.
@gasche just found it - see my review - it's missing a line to write the buffer. At least that is how the regular |
I've noticed something else that's strange (so after I fixed writing the buffer as well). I'm not sure if it's an issue with this PR or something I'm not understanding correctly. When I'm doing Isn't the goal of the (if you prefer this question somewhere else or as bug reports, let me know) |
Could you try the following implementation? let seq_to_channel ?buf ?(len=2096) ?std oc st =
let ob =
match buf with
None -> Buffer.create len
| Some ob -> ob
in
Seq.iter (fun v ->
to_buffer ?std ob v;
Buffer.output_buffer oc ob;
Buffer.clear ob;
) st |
Awesome, that fixes it! ❤️ |
22d0aed
to
a73dc13
Compare
The CI failed on <4.07 versions, I think it's because I forgot to add the |
I would suggest going with #133 first. I took a look at it and it is ready to be merged in my opinion (despite the failure on 4.14 in CI, but this is gonna be handled in the PR) then rebase this PR on it and then tackle #132 after this is merged. Thanks for your effort @gasche and @tcoopman, much appreciated! There's apparently big parts of Yojson that are under-utilized/under-tested so it is great that these things are getting reported and fixed. |
a73dc13
to
5ae68e8
Compare
Done! |
Looks like |
5ae68e8
to
8a60104
Compare
I changed the code to avoid |
|
8a60104
to
5c5ba07
Compare
Damn! Sorry for the noise, I did another attempt with just |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I think we got all concerns addressed. Thanks for your work @gasche!
CHANGES: ### Removed - Removed dependency on easy-format and removed `pretty_format` from `Yojson`, `Yojson.Basic`, `Yojson.Safe` and `Yojson.Raw`. (@c-cube, ocaml-community/yojson#90) - Removed dependency on `biniou`, simplifying the chain of dependencies. This changes some APIs: * `Bi_outbuf.t` in signatures is replaced with `Buffer.t` * `to_outbuf` becomes `to_buffer` and `stream_to_outbuf` becomes `stream_to_buffer` (@Leonidas-from-XIV, ocaml-community/yojson#74, and @gasche, ocaml-community/yojson#132) - Removed `yojson-biniou` library - Removed deprecated `json` type aliasing type `t` which has been available since 1.6.0 (@Leonidas-from-XIV, ocaml-community/yojson#100). - Removed `json_max` type (@Leonidas-from-XIV, ocaml-community/yojson#103) - Removed constraint that the "root" value being rendered (via either `pretty_print` or `to_string`) must be an object or array. (@cemerick, ocaml-community/yojson#121) - Removed `validate_json` as it only made sense if the type was called `json`. (@Leonidas-from-XIV, ocaml-community/yojson#137) ### Add - Add an opam package `yojson-bench` to deal with benchmarks dependency (@tmcgilchrist, ocaml-community/yojson#117) - Add a benchmark to judge the respective performance of providing a buffer vs letting Yojson create an internal (ocaml-community/yojson#134, @Leonidas-from-XIV) - Add an optional `suf` keyword argument was added to functions that write serialized JSON, thus allowing NDJSON output. Most functions default to not adding any suffix except for `to_file` (ocaml-community/yojson#124, @panglesd) and functions writing sequences of values where the default is `\n` (ocaml-community/yojson#135, @Leonidas-from-XIV) ### Change - The `stream_from_*` and `stream_to_*` functions now use a `Seq.t` instead of a `Stream.t`, and they are renamed into `seq_from_*` and `seq_to_*` (@gasche, ocaml-community/yojson#131). ### Fix - Avoid copying unnecessarily large amounts of strings when parsing (ocaml-community/yojson#85, ocaml-community/yojson#108, @Leonidas-from-XIV) - Fix `stream_to_file` (ocaml-community/yojson#133, @tcoopman and @gasche)
CHANGES: ### Removed - Removed dependency on easy-format and removed `pretty_format` from `Yojson`, `Yojson.Basic`, `Yojson.Safe` and `Yojson.Raw`. (@c-cube, ocaml-community/yojson#90) - Removed dependency on `biniou`, simplifying the chain of dependencies. This changes some APIs: * `Bi_outbuf.t` in signatures is replaced with `Buffer.t` * `to_outbuf` becomes `to_buffer` and `stream_to_outbuf` becomes `stream_to_buffer` (@Leonidas-from-XIV, ocaml-community/yojson#74, and @gasche, ocaml-community/yojson#132) - Removed `yojson-biniou` library - Removed deprecated `json` type aliasing type `t` which has been available since 1.6.0 (@Leonidas-from-XIV, ocaml-community/yojson#100). - Removed `json_max` type (@Leonidas-from-XIV, ocaml-community/yojson#103) - Removed constraint that the "root" value being rendered (via either `pretty_print` or `to_string`) must be an object or array. (@cemerick, ocaml-community/yojson#121) - Removed `validate_json` as it only made sense if the type was called `json`. (@Leonidas-from-XIV, ocaml-community/yojson#137) ### Add - Add an opam package `yojson-bench` to deal with benchmarks dependency (@tmcgilchrist, ocaml-community/yojson#117) - Add a benchmark to judge the respective performance of providing a buffer vs letting Yojson create an internal (ocaml-community/yojson#134, @Leonidas-from-XIV) - Add an optional `suf` keyword argument was added to functions that write serialized JSON, thus allowing NDJSON output. Most functions default to not adding any suffix except for `to_file` (ocaml-community/yojson#124, @panglesd) and functions writing sequences of values where the default is `\n` (ocaml-community/yojson#135, @Leonidas-from-XIV) ### Change - The `stream_from_*` and `stream_to_*` functions now use a `Seq.t` instead of a `Stream.t`, and they are renamed into `seq_from_*` and `seq_to_*` (@gasche, ocaml-community/yojson#131). ### Fix - Avoid copying unnecessarily large amounts of strings when parsing (ocaml-community/yojson#85, ocaml-community/yojson#108, @Leonidas-from-XIV) - Fix `stream_to_file` (ocaml-community/yojson#133, @tcoopman and @gasche)
CHANGES: ### Removed - Removed dependency on easy-format and removed `pretty_format` from `Yojson`, `Yojson.Basic`, `Yojson.Safe` and `Yojson.Raw`. (@c-cube, ocaml-community/yojson#90) - Removed dependency on `biniou`, simplifying the chain of dependencies. This changes some APIs: * `Bi_outbuf.t` in signatures is replaced with `Buffer.t` * `to_outbuf` becomes `to_buffer` and `stream_to_outbuf` becomes `stream_to_buffer` (@Leonidas-from-XIV, ocaml-community/yojson#74, and @gasche, ocaml-community/yojson#132) - Removed `yojson-biniou` library - Removed deprecated `json` type aliasing type `t` which has been available since 1.6.0 (@Leonidas-from-XIV, ocaml-community/yojson#100). - Removed `json_max` type (@Leonidas-from-XIV, ocaml-community/yojson#103) - Removed constraint that the "root" value being rendered (via either `pretty_print` or `to_string`) must be an object or array. (@cemerick, ocaml-community/yojson#121) - Removed `validate_json` as it only made sense if the type was called `json`. (@Leonidas-from-XIV, ocaml-community/yojson#137) ### Add - Add an opam package `yojson-bench` to deal with benchmarks dependency (@tmcgilchrist, ocaml-community/yojson#117) - Add a benchmark to judge the respective performance of providing a buffer vs letting Yojson create an internal (ocaml-community/yojson#134, @Leonidas-from-XIV) - Add an optional `suf` keyword argument was added to functions that write serialized JSON, thus allowing NDJSON output. Most functions default to not adding any suffix except for `to_file` (ocaml-community/yojson#124, @panglesd) and functions writing sequences of values where the default is `\n` (ocaml-community/yojson#135, @Leonidas-from-XIV) ### Change - The `stream_from_*` and `stream_to_*` functions now use a `Seq.t` instead of a `Stream.t`, and they are renamed into `seq_from_*` and `seq_to_*` (@gasche, ocaml-community/yojson#131). ### Fix - Avoid copying unnecessarily large amounts of strings when parsing (ocaml-community/yojson#85, ocaml-community/yojson#108, @Leonidas-from-XIV) - Fix `stream_to_file` (ocaml-community/yojson#133, @tcoopman and @gasche)
Before:
After: