Combine peekstream and bufferedReader together. #16114

hawkbee · 2023-06-20T12:12:33Z

Resolve #4501

andrewrk · 2023-07-10T17:55:27Z

Looks great! I would like to request 2 things before merging:

What kind of testing did you perform on these changes?
Can you provide any performance measurements on the changes to std.http?

truemedian

From a little bit of benchmarking I cannot find any meaningful difference in the speed difference of std.http parsing before and after this change (which makes sense, the overall structure of the buffering remains nearly identical).

lib/std/http/Client.zig

lib/std/http/Server.zig

hawkbee · 2023-07-11T13:52:38Z

What kind of testing did you perform on these changes?

All the old test preserved and new add function had added basic tests. I mainly rely on the exist std.http tests, and try hard to passed all exist standalone std.http tests.

Can you provide any performance measurements on the changes to std.http?

I need some guides how to acheive this.

From a little bit of benchmarking I cannot find any meaningful difference in the speed difference of std.http parsing before and after this change (which makes sense, the overall structure of the buffering remains nearly identical).

Thank @truemedian for benchmarking this change, and thank you for ground up work of std.http and review this pull request. The entire idea of this pull request came from the attempt to deduplicate the code of buffer management in connection, and reuse those for socket, file and the like.

andrewrk · 2023-07-13T16:44:12Z

I'm sorry, I have one more review comment for you, @hawkbee. I would like to challenge the idea of adding BufferedReaderWriter. Can you justify why this is needed rather than using a combination of BufferedReader and BufferedWriter directly in std.http.Client?

hawkbee · 2023-07-14T02:16:55Z

I'm sorry, I have one more review comment for you, @hawkbee. I would like to challenge the idea of adding BufferedReaderWriter. Can you justify why this is needed rather than using a combination of BufferedReader and BufferedWriter directly in std.http.Client?

Many thanks for your review, very glad to see zig becoming better. As we see, the BufferedReaderWriter is very much a wrapper for BufferedReader and BufferedWriter , sort of like StreamSource for Reader, Writer and SeekableStream. In fact, in my Seeker interface prototype, I had tried add Seeker interface into the BufferedReaderWriter just like what StreamSource had did. Given that the underlying buffer is independent of each other, I think the only gain of this BufferedReaderWriter is I can get the object allocate by one allocator.create instead of two. If embed both two struct in connection, this also can achive, so the new interface is not a must. I will delete this interface if you wish.

Some other thoughts related: The buffer of BufferedReader/BufferedWriter is embed in this struct, so the memory is allocated on stack when used like this bufferedWriter , When I changed the stack allocated object to heap allocated, add a single BufferedReaderWriter can get both reader and writer buffer.

Now the reader buffer is 16K for TLS, somewhat small, but user may choose a big buffer size, that may overflow stack for small stack size machine. That voilate allocate small size memory on stack, big size memory on heap.

So my attention here is:

keep the buffer static in struct and used by allocator.create .
or make the buffer dynamic and change the interface allocator based,
or keep the current usage case is fine like.

hawkbee · 2023-07-16T01:55:51Z

when test this branch to get https://github.com/MasterQ32/parser-toolkit/archive/master.tar.gz
use example program, the following ocurr.

thread 365352 panic: index out of bounds: index 16888, len 16645
/home/hequn/router/zig/zig/lib/std/crypto/tls/Client.zig:1251:81: 0x382515 in finishRead2 (testhttp)
        @memcpy(c.partially_read_buffer[c.partial_ciphertext_idx + first.len ..][0..frag1.len], frag1);
                                                                                ^
/home/hequn/router/zig/zig/lib/std/crypto/tls/Client.zig:1071:35: 0x37cac6 in readvAdvanced__anon_9983 (testhttp)
                return finishRead2(c, first, frag1, vp.total);
                                  ^
/home/hequn/router/zig/zig/lib/std/crypto/tls/Client.zig:900:38: 0x37a889 in readvAtLeast__anon_9982 (testhttp)
        var amt = try c.readvAdvanced(stream, iovecs[vec_i..]);
                                     ^
/home/hequn/router/zig/zig/lib/std/crypto/tls/Client.zig:861:24: 0x37a682 in readAtLeast__anon_9980 (testhttp)
    return readvAtLeast(c, stream, &iovecs, len);
                       ^
/home/hequn/router/zig/zig/lib/std/http/Client.zig:168:48: 0x2abcfc in readAtLeast (testhttp)
            .tls => conn.tls_client.readAtLeast(conn.stream, buffer, len),
                                               ^
/home/hequn/router/zig/zig/lib/std/http/Client.zig:183:32: 0x27806a in read (testhttp)
        return conn.readAtLeast(buffer, 1);
                               ^
/home/hequn/router/zig/zig/lib/std/io/buffered_reader.zig:247:56: 0x38a16f in readAtLeast (testhttp)
                    n = try self.unbuffered_reader.read(dest[dest_index..]);
                                                       ^
/home/hequn/router/zig/zig/lib/std/io/buffered_reader.zig:209:36: 0x2b4a9a in read (testhttp)
            return self.readAtLeast(dest, 1);
                                   ^
/home/hequn/router/zig/zig/lib/std/http/protocol.zig:542:52: 0x2b411e in read__anon_9003 (testhttp)
                        const nread = try conn.read(buffer[0..can_read]);
                                                   ^
/home/hequn/router/zig/zig/lib/std/http/Client.zig:575:53: 0x28017c in transferRead (testhttp)
            const amt = try req.response.parser.read(&req.connection.?.data.bufreader, buf[index..], req.response.skip);
                                                    ^
/home/hequn/router/zig/zig/lib/std/http/Client.zig:703:41: 0x26c663 in read (testhttp)
            else => try req.transferRead(buffer),
                                        ^
/home/hequn/router/zig/zig/lib/std/http/Client.zig:732:33: 0x26c285 in readAll (testhttp)
            const amt = try read(req, buffer[index..]);
                                ^
/home/hequn/router/zig/zig/testhttp.zig:35:36: 0x26d0e7 in main (testhttp)
        const amt = try req.readAll(&buf);

or frequently this

error: HttpChunkInvalid
/home/hequn/router/zig/zig/lib/std/http/protocol.zig:555:37: 0x2b48a8 in read__anon_9003 (testhttp)
                        .invalid => return error.HttpChunkInvalid,
                                    ^
/home/hequn/router/zig/zig/lib/std/http/Client.zig:575:25: 0x280197 in transferRead (testhttp)
            const amt = try req.response.parser.read(&req.connection.?.data.bufreader, buf[index..], req.response.skip);
                        ^
/home/hequn/router/zig/zig/lib/std/http/Client.zig:703:21: 0x26c687 in read (testhttp)
            else => try req.transferRead(buffer),
                    ^
/home/hequn/router/zig/zig/lib/std/http/Client.zig:732:25: 0x26c2a0 in readAll (testhttp)
            const amt = try read(req, buffer[index..]);
                        ^
/home/hequn/router/zig/zig/testhttp.zig:35:21: 0x26d11d in main (testhttp)
        const amt = try req.readAll(&buf);

@andrewrk Sorry to distrurb, could you please have a look the stacktrace to see if this is bug in TLS trigger on this branch or had introduced a bug corruption memory somewhere.

Edit: After the following change, both HTTP/HTTPS request download file and toWriter copy file works.

diff --git a/lib/std/crypto/tls/Client.zig b/lib/std/crypto/tls/Client.zig
index 37306dd37..edbe50830 100644
--- a/lib/std/crypto/tls/Client.zig
+++ b/lib/std/crypto/tls/Client.zig
@@ -43,7 +43,7 @@ application_cipher: tls.ApplicationCipher,
 /// 3. unused
 /// The fields `partial_cleartext_idx`, `partial_ciphertext_idx`, and
 /// `partial_ciphertext_end` describe the span of the segments.
-partially_read_buffer: [tls.max_ciphertext_record_len]u8,
+partially_read_buffer: [tls.max_ciphertext_record_len * 4]u8,
 
 /// This is an example of the type that is needed by the read and write
 /// functions. It can have any fields but it must at least have these
@@ -952,7 +952,7 @@ pub fn readvAdvanced(c: *Client, stream: anytype, iovecs: []const std.os.iovec)
     // too small to fit the cleartext, which may be as large as `max_ciphertext_len`.
     var cleartext_stack_buffer: [max_ciphertext_len]u8 = undefined;
     // Temporarily stores ciphertext before decrypting it and giving it to `iovecs`.
-    var in_stack_buffer: [max_ciphertext_len * 4]u8 = undefined;
+    var in_stack_buffer: [max_ciphertext_len]u8 = undefined;
     // How many bytes left in the user's buffer.
     const free_size = vp.freeSize();
     // The amount of the user's buffer that we need to repurpose for storing

it does not gain extra benifits for adding this.

underlying buffer is smaller than tokenized token. Add small buffer test BufferedReader tokenize to work as expected. And BufferedReader peek method will work when least param greater than zero.

hawkbee · 2023-07-17T09:44:08Z

May be the same as #15226, #15590?

daurnimator · 2023-09-14T11:40:54Z

Some other thoughts related: The buffer of BufferedReader/BufferedWriter is embed in this struct, so the memory is allocated on stack when used like this bufferedWriter , When I changed the stack allocated object to heap allocated, add a single BufferedReaderWriter can get both reader and writer buffer.

Now the reader buffer is 16K for TLS, somewhat small, but user may choose a big buffer size, that may overflow stack for small stack size machine. That voilate allocate small size memory on stack, big size memory on heap.

So my attention here is:
1. keep  the buffer static in struct  and used by `allocator.create` .

2. or make the buffer dynamic and change the interface allocator based,

3. or keep the current usage case is fine [like](https://github.com/ziglang/zig/blob/3ec337484b80647ee23217533c1d62e914afa4f1/lib/std/net.zig#L1254).

FWIW, LinearFifo can swap between these different modes:

zig/lib/std/fifo.zig

Line 11 in 8fb4a4e

pub const LinearFifoBufferType = union(enum) {

You may want to look into implementing BufferedReader on top of LinearFifo

See #16114. This functionality should be added to bufferedReader instead.

andrewrk · 2024-03-14T02:27:53Z

This never got to the point where I was comfortable merging it as-is, and now it needs to be redone. I have opened #19297 to remove PeekStream and I will update #4501 to reflect that change.

See #16114. This functionality should be added to bufferedReader instead.

See ziglang#16114. This functionality should be added to bufferedReader instead.

hawkbee force-pushed the peekbuf branch from b8137cb to 6494945 Compare June 20, 2023 15:59

ifreund added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. standard library This issue involves writing Zig code for the standard library. labels Jun 22, 2023

hawkbee force-pushed the peekbuf branch 2 times, most recently from 63eefb0 to dff951e Compare June 30, 2023 10:23

truemedian reviewed Jul 10, 2023

View reviewed changes

lib/std/http/Client.zig Outdated Show resolved Hide resolved

lib/std/http/Server.zig Outdated Show resolved Hide resolved

lib/std/http/Server.zig Outdated Show resolved Hide resolved

hawkbee force-pushed the peekbuf branch from 9d8d64c to b6d8f93 Compare July 11, 2023 12:06

hawkbee force-pushed the peekbuf branch from 22c95c5 to 74361d4 Compare July 14, 2023 10:52

hawkbee marked this pull request as draft July 14, 2023 23:49

hawkbee added 5 commits July 17, 2023 11:01

Combine peekstream and bufferedReader together.

4a42810

Give BufferedReader and BufferedWriter a real usage try.

1214d81

Should close socket when allocation error occurs.

c82fb4c

Remove BufferedReaderWriter as Andrew suggested,

6f230c6

it does not gain extra benifits for adding this.

Buffer iterator now will return a truncated token when

fb6dbf7

underlying buffer is smaller than tokenized token. Add small buffer test BufferedReader tokenize to work as expected. And BufferedReader peek method will work when least param greater than zero.

hawkbee force-pushed the peekbuf branch from 74361d4 to f619662 Compare July 17, 2023 04:27

Peek should return error of read so user can catch the reader error.

09a3ab0

hawkbee force-pushed the peekbuf branch from f619662 to 09a3ab0 Compare July 17, 2023 04:58

hawkbee marked this pull request as ready for review July 17, 2023 09:45

hawkbee mentioned this pull request Jul 18, 2023

tls.Client: Reduce temporary buffer size for reads to prevent buffer overflow. #16436

Closed

andrewrk added a commit that referenced this pull request Mar 14, 2024

remove std.io.PeekStream

36a435b

See #16114. This functionality should be added to bufferedReader instead.

andrewrk mentioned this pull request Mar 14, 2024

remove std.io.PeekStream #19297

Merged

andrewrk closed this Mar 14, 2024

andrewrk added a commit that referenced this pull request Mar 14, 2024

remove std.io.PeekStream

bc5b094

See #16114. This functionality should be added to bufferedReader instead.

RossComputerGuy pushed a commit to ExpidusOS-archive/zig that referenced this pull request Mar 20, 2024

remove std.io.PeekStream

9983e4d

See ziglang#16114. This functionality should be added to bufferedReader instead.

notcancername mentioned this pull request Dec 2, 2024

I/O peeking #22125

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combine peekstream and bufferedReader together. #16114

Combine peekstream and bufferedReader together. #16114

hawkbee commented Jun 20, 2023 •

edited

Loading

andrewrk commented Jul 10, 2023 •

edited

Loading

truemedian left a comment

hawkbee commented Jul 11, 2023

andrewrk commented Jul 13, 2023

hawkbee commented Jul 14, 2023 •

edited

Loading

hawkbee commented Jul 16, 2023 •

edited

Loading

hawkbee commented Jul 17, 2023 •

edited

Loading

daurnimator commented Sep 14, 2023

andrewrk commented Mar 14, 2024

Combine peekstream and bufferedReader together. #16114

Combine peekstream and bufferedReader together. #16114

Conversation

hawkbee commented Jun 20, 2023 • edited Loading

andrewrk commented Jul 10, 2023 • edited Loading

truemedian left a comment

Choose a reason for hiding this comment

hawkbee commented Jul 11, 2023

andrewrk commented Jul 13, 2023

hawkbee commented Jul 14, 2023 • edited Loading

hawkbee commented Jul 16, 2023 • edited Loading

hawkbee commented Jul 17, 2023 • edited Loading

daurnimator commented Sep 14, 2023

andrewrk commented Mar 14, 2024

hawkbee commented Jun 20, 2023 •

edited

Loading

andrewrk commented Jul 10, 2023 •

edited

Loading

hawkbee commented Jul 14, 2023 •

edited

Loading

hawkbee commented Jul 16, 2023 •

edited

Loading

hawkbee commented Jul 17, 2023 •

edited

Loading