Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-lined envelope reading corruption caused by direct reference into bufio.Reader's internal buffer rollover. #214

Merged
merged 1 commit into from
Jul 25, 2023

Conversation

jf-tech
Copy link
Owner

@jf-tech jf-tech commented Jul 23, 2023

BUG: #213

If we're dealing with multi-lined envelope (either by rows or by header/footer), readLine() will be called several times, thus whatever ios.ByteReadLine, which uses bufio.Reader underneath, returns in a previous call may be potentially be invalidated due to bufio.Reader's internal buf rollover. If we read the previous line directly, it would cause corruption.

To fix the problem the easiest solution would be simply copying the return []byte from ios.ByteReadLine every single time. But for files with single-line envelope, which are the vast majority cases, this copy becomes unnecessary and burdensome on gc. So the trick is to has a flag on reader.linesBuf's last element to tell if it contains a reference into the bufio.Reader's internal buffer, or it's a copy. Every time before we call bufio.Reader read, we check reader.liensBuf's last element flag, if it is not a copy, then we will turn it into a copy.

This way, we optimize for the vast majority cases without needing allocations, and avoid any potential corruptions in the multi-lined envelope cases.

@paulstadler

…to bufio.Reader's internal buffer rollover.

BUG: #213

If we're dealing with multi-lined envelope (either by rows or by header/footer), readLine()
will be called several times, thus whatever ios.ByteReadLine, which uses bufio.Reader underneath,
returns in a previous call may be potentially be invalidated due to bufio.Reader's internal buf
rollover. If we read the previous line directly, it would cause corruption.

To fix the problem the easiest solution would be simply copying the return []byte from
ios.ByteReadLine every single time. But for files with single-line envelope, which are the vast
majority cases, this copy becomes unnecessary and burdensome on gc. So the trick is to has a flag
on reader.linesBuf's last element to tell if it contains a reference into the bufio.Reader's
internal buffer, or it's a copy. Every time before we call bufio.Reader read, we check
reader.liensBuf's last element flag, if it is not a copy, then we will turn it into a copy.

This way, we optimize for the vast majority cases without needing allocations, and avoid any potential
corruptions in the multi-lined envelope cases.
@jf-tech jf-tech added the bug Something isn't working label Jul 23, 2023
@jf-tech jf-tech self-assigned this Jul 23, 2023
@codecov
Copy link

codecov bot commented Jul 23, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (37370fa) 100.00% compared to head (5b7e14d) 100.00%.

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #214   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           53        53           
  Lines         3021      3027    +6     
=========================================
+ Hits          3021      3027    +6     
Impacted Files Coverage Δ
.../omniv21/fileformat/flatfile/fixedlength/reader.go 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jf-tech jf-tech linked an issue Jul 23, 2023 that may be closed by this pull request
@jf-tech jf-tech merged commit 9e0c8da into master Jul 25, 2023
3 checks passed
@jf-tech jf-tech deleted the 213 branch July 25, 2023 01:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

buffer overflow with fixedlength2 + header/footer
2 participants