Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO#write does not support multiple arguments with different encodings #2829

Closed
mhib opened this issue Jan 15, 2023 · 4 comments
Closed

IO#write does not support multiple arguments with different encodings #2829

mhib opened this issue Jan 15, 2023 · 4 comments

Comments

@mhib
Copy link

mhib commented Jan 15, 2023

Command to reproduce the issue:

 ruby -rtempfile -e 'p Tempfile.open { |f| f.write("\x87".b, "ą") }'

It outputs 3 both on CRuby and JRuby, but crashes on TruffleRuby.

@andrykonchin
Copy link
Member

Thank you for reporting.

I can reproduce it on TruffleRuby master:

jt -q ruby -rtempfile -e 'p Tempfile.open { |f| f.write("\x87".b, "ą") }'
<internal:core> core/array.rb:675:in `<<': incompatible character encodings: ASCII-8BIT and UTF-8 (Encoding::CompatibilityError)
	from <internal:core> core/array.rb:675:in `block in join'
	from <internal:core> core/truffle/thread_operations.rb:75:in `detect_recursion'
	from <internal:core> core/array.rb:641:in `join'
	from <internal:core> core/io.rb:2330:in `write'
	from /Users/andrykonchin/projects/truffleruby-ws/graal/sdk/mxbuild/darwin-amd64/GRAALVM_D83C12B6D6_JAVA19/graalvm-d83c12b6d6-java19-23.0.0-dev/Contents/Home/languages/ruby/lib/mri/delegate.rb:349:in `write'
	from -e:1:in `block in <main>'
	from /Users/andrykonchin/projects/truffleruby-ws/graal/sdk/mxbuild/darwin-amd64/GRAALVM_D83C12B6D6_JAVA19/graalvm-d83c12b6d6-java19-23.0.0-dev/Contents/Home/languages/ruby/lib/mri/tempfile.rb:317:in `open'
	from -e:1:in `<main>'

@eregon
Copy link
Member

eregon commented Jan 16, 2023

Interesting, this works because the IO#external_encoding is nil and so there is no conversion and it's just written as plain bytes.

One solution is to internally call write for each string (using data.each loop), but then that's many system calls, so seems somewhat inefficient. Maybe still fine.
Might be incorrect for Fiber scheduler purposes, not sure if CRuby concatenates for that case or calls the hook for each string, that'd be worth checking.

Another solution is have some Primitive to concatenate strings without looking at the encoding, but this is only OK if IO#write is not going to do any encoding conversion, so seems a bit tricky but feasible.

@andrykonchin
Copy link
Member

andrykonchin commented Jan 23, 2023

Regarding a Fiber scheduler. It looks like there are several scenarios of how CRuby (IO#write) writes data and calls a scheduler when several arguments passed:

  • data is written with writev syscall and a scheduler is called for each String argument (if IO#sync = true and all the data doesn't fit in an IO internal buffer)
  • data is added to an IO internal buffer, a scheduler isn't called at all (if IO#sync = true and all the data fits in an IO internal buffer)
  • data is added to an IO internal buffer until it's full, buffer + the next String argument are written with writev syscall and a scheduler is called for each of them only when buffer is full (if IO#sync = false)
  • a String argument is written with write and a schedules it called for it if the String argument doesn't fit into an IO internal buffer (if IO#sync = false and IO internal buffer is empty)

@andrykonchin
Copy link
Member

Fixed in 360ec34

@andrykonchin andrykonchin added this to the 23.0.0 Release milestone Jan 30, 2023
@andrykonchin andrykonchin self-assigned this Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants