Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine peekstream and bufferedReader together. #16114

Closed
wants to merge 6 commits into from

Conversation

hawkbee
Copy link
Contributor

@hawkbee hawkbee commented Jun 20, 2023

Resolve #4501

@ifreund ifreund added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. standard library This issue involves writing Zig code for the standard library. labels Jun 22, 2023
@hawkbee hawkbee force-pushed the peekbuf branch 2 times, most recently from 63eefb0 to dff951e Compare June 30, 2023 10:23
@andrewrk
Copy link
Member

andrewrk commented Jul 10, 2023

Looks great! I would like to request 2 things before merging:

  1. What kind of testing did you perform on these changes?
  2. Can you provide any performance measurements on the changes to std.http?

Copy link
Contributor

@truemedian truemedian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a little bit of benchmarking I cannot find any meaningful difference in the speed difference of std.http parsing before and after this change (which makes sense, the overall structure of the buffering remains nearly identical).

lib/std/http/Client.zig Outdated Show resolved Hide resolved
lib/std/http/Server.zig Outdated Show resolved Hide resolved
lib/std/http/Server.zig Outdated Show resolved Hide resolved
@hawkbee
Copy link
Contributor Author

hawkbee commented Jul 11, 2023

  1. What kind of testing did you perform on these changes?

All the old test preserved and new add function had added basic tests. I mainly rely on the exist std.http tests, and try hard to passed all exist standalone std.http tests.

  1. Can you provide any performance measurements on the changes to std.http?

I need some guides how to acheive this.

From a little bit of benchmarking I cannot find any meaningful difference in the speed difference of std.http parsing before and after this change (which makes sense, the overall structure of the buffering remains nearly identical).

Thank @truemedian for benchmarking this change, and thank you for ground up work of std.http and review this pull request. The entire idea of this pull request came from the attempt to deduplicate the code of buffer management in connection, and reuse those for socket, file and the like.

@andrewrk
Copy link
Member

I'm sorry, I have one more review comment for you, @hawkbee. I would like to challenge the idea of adding BufferedReaderWriter. Can you justify why this is needed rather than using a combination of BufferedReader and BufferedWriter directly in std.http.Client?

@hawkbee
Copy link
Contributor Author

hawkbee commented Jul 14, 2023

I'm sorry, I have one more review comment for you, @hawkbee. I would like to challenge the idea of adding BufferedReaderWriter. Can you justify why this is needed rather than using a combination of BufferedReader and BufferedWriter directly in std.http.Client?

Many thanks for your review, very glad to see zig becoming better. As we see, the BufferedReaderWriter is very much a wrapper for BufferedReader and BufferedWriter , sort of like StreamSource for Reader, Writer and SeekableStream. In fact, in my Seeker interface prototype, I had tried add Seeker interface into the BufferedReaderWriter just like what StreamSource had did. Given that the underlying buffer is independent of each other, I think the only gain of this BufferedReaderWriter is I can get the object allocate by one allocator.create instead of two. If embed both two struct in connection, this also can achive, so the new interface is not a must. I will delete this interface if you wish.

Some other thoughts related: The buffer of BufferedReader/BufferedWriter is embed in this struct, so the memory is allocated on stack when used like this bufferedWriter , When I changed the stack allocated object to heap allocated, add a single BufferedReaderWriter can get both reader and writer buffer.

Now the reader buffer is 16K for TLS, somewhat small, but user may choose a big buffer size, that may overflow stack for small stack size machine. That voilate allocate small size memory on stack, big size memory on heap.

So my attention here is:

  1. keep the buffer static in struct and used by allocator.create .
  2. or make the buffer dynamic and change the interface allocator based,
  3. or keep the current usage case is fine like.

@hawkbee hawkbee marked this pull request as draft July 14, 2023 23:49
@hawkbee
Copy link
Contributor Author

hawkbee commented Jul 16, 2023

when test this branch to get https://github.com/MasterQ32/parser-toolkit/archive/master.tar.gz
use example program, the following ocurr.

thread 365352 panic: index out of bounds: index 16888, len 16645
/home/hequn/router/zig/zig/lib/std/crypto/tls/Client.zig:1251:81: 0x382515 in finishRead2 (testhttp)
        @memcpy(c.partially_read_buffer[c.partial_ciphertext_idx + first.len ..][0..frag1.len], frag1);
                                                                                ^
/home/hequn/router/zig/zig/lib/std/crypto/tls/Client.zig:1071:35: 0x37cac6 in readvAdvanced__anon_9983 (testhttp)
                return finishRead2(c, first, frag1, vp.total);
                                  ^
/home/hequn/router/zig/zig/lib/std/crypto/tls/Client.zig:900:38: 0x37a889 in readvAtLeast__anon_9982 (testhttp)
        var amt = try c.readvAdvanced(stream, iovecs[vec_i..]);
                                     ^
/home/hequn/router/zig/zig/lib/std/crypto/tls/Client.zig:861:24: 0x37a682 in readAtLeast__anon_9980 (testhttp)
    return readvAtLeast(c, stream, &iovecs, len);
                       ^
/home/hequn/router/zig/zig/lib/std/http/Client.zig:168:48: 0x2abcfc in readAtLeast (testhttp)
            .tls => conn.tls_client.readAtLeast(conn.stream, buffer, len),
                                               ^
/home/hequn/router/zig/zig/lib/std/http/Client.zig:183:32: 0x27806a in read (testhttp)
        return conn.readAtLeast(buffer, 1);
                               ^
/home/hequn/router/zig/zig/lib/std/io/buffered_reader.zig:247:56: 0x38a16f in readAtLeast (testhttp)
                    n = try self.unbuffered_reader.read(dest[dest_index..]);
                                                       ^
/home/hequn/router/zig/zig/lib/std/io/buffered_reader.zig:209:36: 0x2b4a9a in read (testhttp)
            return self.readAtLeast(dest, 1);
                                   ^
/home/hequn/router/zig/zig/lib/std/http/protocol.zig:542:52: 0x2b411e in read__anon_9003 (testhttp)
                        const nread = try conn.read(buffer[0..can_read]);
                                                   ^
/home/hequn/router/zig/zig/lib/std/http/Client.zig:575:53: 0x28017c in transferRead (testhttp)
            const amt = try req.response.parser.read(&req.connection.?.data.bufreader, buf[index..], req.response.skip);
                                                    ^
/home/hequn/router/zig/zig/lib/std/http/Client.zig:703:41: 0x26c663 in read (testhttp)
            else => try req.transferRead(buffer),
                                        ^
/home/hequn/router/zig/zig/lib/std/http/Client.zig:732:33: 0x26c285 in readAll (testhttp)
            const amt = try read(req, buffer[index..]);
                                ^
/home/hequn/router/zig/zig/testhttp.zig:35:36: 0x26d0e7 in main (testhttp)
        const amt = try req.readAll(&buf);

or frequently this

error: HttpChunkInvalid
/home/hequn/router/zig/zig/lib/std/http/protocol.zig:555:37: 0x2b48a8 in read__anon_9003 (testhttp)
                        .invalid => return error.HttpChunkInvalid,
                                    ^
/home/hequn/router/zig/zig/lib/std/http/Client.zig:575:25: 0x280197 in transferRead (testhttp)
            const amt = try req.response.parser.read(&req.connection.?.data.bufreader, buf[index..], req.response.skip);
                        ^
/home/hequn/router/zig/zig/lib/std/http/Client.zig:703:21: 0x26c687 in read (testhttp)
            else => try req.transferRead(buffer),
                    ^
/home/hequn/router/zig/zig/lib/std/http/Client.zig:732:25: 0x26c2a0 in readAll (testhttp)
            const amt = try read(req, buffer[index..]);
                        ^
/home/hequn/router/zig/zig/testhttp.zig:35:21: 0x26d11d in main (testhttp)
        const amt = try req.readAll(&buf);

@andrewrk Sorry to distrurb, could you please have a look the stacktrace to see if this is bug in TLS trigger on this branch or had introduced a bug corruption memory somewhere.

Edit: After the following change, both HTTP/HTTPS request download file and toWriter copy file works.

diff --git a/lib/std/crypto/tls/Client.zig b/lib/std/crypto/tls/Client.zig
index 37306dd37..edbe50830 100644
--- a/lib/std/crypto/tls/Client.zig
+++ b/lib/std/crypto/tls/Client.zig
@@ -43,7 +43,7 @@ application_cipher: tls.ApplicationCipher,
 /// 3. unused
 /// The fields `partial_cleartext_idx`, `partial_ciphertext_idx`, and
 /// `partial_ciphertext_end` describe the span of the segments.
-partially_read_buffer: [tls.max_ciphertext_record_len]u8,
+partially_read_buffer: [tls.max_ciphertext_record_len * 4]u8,
 
 /// This is an example of the type that is needed by the read and write
 /// functions. It can have any fields but it must at least have these
@@ -952,7 +952,7 @@ pub fn readvAdvanced(c: *Client, stream: anytype, iovecs: []const std.os.iovec)
     // too small to fit the cleartext, which may be as large as `max_ciphertext_len`.
     var cleartext_stack_buffer: [max_ciphertext_len]u8 = undefined;
     // Temporarily stores ciphertext before decrypting it and giving it to `iovecs`.
-    var in_stack_buffer: [max_ciphertext_len * 4]u8 = undefined;
+    var in_stack_buffer: [max_ciphertext_len]u8 = undefined;
     // How many bytes left in the user's buffer.
     const free_size = vp.freeSize();
     // The amount of the user's buffer that we need to repurpose for storing

hawkbee added 5 commits July 17, 2023 11:01
it does not gain extra benifits for adding this.
underlying buffer is smaller than tokenized token.

Add small buffer test BufferedReader tokenize to work as expected.

And BufferedReader peek method will work when least param
greater than zero.
@hawkbee
Copy link
Contributor Author

hawkbee commented Jul 17, 2023

May be the same as #15226, #15590?

@daurnimator
Copy link
Contributor

Some other thoughts related: The buffer of BufferedReader/BufferedWriter is embed in this struct, so the memory is allocated on stack when used like this bufferedWriter , When I changed the stack allocated object to heap allocated, add a single BufferedReaderWriter can get both reader and writer buffer.

Now the reader buffer is 16K for TLS, somewhat small, but user may choose a big buffer size, that may overflow stack for small stack size machine. That voilate allocate small size memory on stack, big size memory on heap.

So my attention here is:

1. keep  the buffer static in struct  and used by `allocator.create` .

2. or make the buffer dynamic and change the interface allocator based,

3. or keep the current usage case is fine [like](https://github.com/ziglang/zig/blob/3ec337484b80647ee23217533c1d62e914afa4f1/lib/std/net.zig#L1254).

FWIW, LinearFifo can swap between these different modes:

pub const LinearFifoBufferType = union(enum) {

You may want to look into implementing BufferedReader on top of LinearFifo

andrewrk added a commit that referenced this pull request Mar 14, 2024
See #16114. This functionality should be added to bufferedReader
instead.
@andrewrk
Copy link
Member

This never got to the point where I was comfortable merging it as-is, and now it needs to be redone. I have opened #19297 to remove PeekStream and I will update #4501 to reflect that change.

@andrewrk andrewrk closed this Mar 14, 2024
andrewrk added a commit that referenced this pull request Mar 14, 2024
See #16114. This functionality should be added to bufferedReader
instead.
RossComputerGuy pushed a commit to ExpidusOS-archive/zig that referenced this pull request Mar 20, 2024
See ziglang#16114. This functionality should be added to bufferedReader
instead.
@notcancername notcancername mentioned this pull request Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

provide a peeking API to std.io.BufferedReader
5 participants