Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow CSV.open with StringIO argument #302

Merged
merged 16 commits into from
Aug 5, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 44 additions & 18 deletions lib/csv.rb
Original file line number Diff line number Diff line change
Expand Up @@ -1508,10 +1508,8 @@ def generate_lines(rows, **options)

#
# :call-seq:
# open(file_path, mode = "rb", **options ) -> new_csv
# open(io, mode = "rb", **options ) -> new_csv
# open(file_path, mode = "rb", **options ) { |csv| ... } -> object
# open(io, mode = "rb", **options ) { |csv| ... } -> object
# open(path_or_io, mode = "rb", **options ) -> new_csv
# open(path_or_io, mode = "rb", **options ) { |csv| ... } -> object
#
# possible options elements:
# keyword form:
Expand All @@ -1520,7 +1518,7 @@ def generate_lines(rows, **options)
# :undef => :replace # replace undefined conversion
# :replace => string # replacement string ("?" or "\uFFFD" if not specified)
#
# * Argument +path+, if given, must be the path to a file.
# * Argument +path_or_io+, must be a file path or an \IO stream.
# :include: ../doc/csv/arguments/io.rdoc
# * Argument +mode+, if given, must be a \File mode.
# See {Access Modes}[https://docs.ruby-lang.org/en/master/File.html#class-File-label-Access+Modes].
Expand All @@ -1544,6 +1542,9 @@ def generate_lines(rows, **options)
# path = 't.csv'
# File.write(path, string)
#
# string_io = StringIO.new
# string_io << "foo,0\nbar,1\nbaz,2\n"
#
# ---
#
# With no block given, returns a new \CSV object.
Expand All @@ -1556,6 +1557,9 @@ def generate_lines(rows, **options)
# csv = CSV.open(File.open(path))
# csv # => #<CSV io_type:File io_path:"t.csv" encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"">
#
# Create a \CSV object using a \StringIO:
# csv = CSV.open(string_io)
# csv # => #<CSV io_type:StringIO encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"">
# ---
#
# With a block given, calls the block with the created \CSV object;
Expand All @@ -1573,16 +1577,24 @@ def generate_lines(rows, **options)
# Output:
# #<CSV io_type:File io_path:"t.csv" encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"">
#
# Using a \StringIO:
# csv = CSV.open(string_io) {|csv| p csv}
# csv # => #<CSV io_type:StringIO encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"">
# Output:
# #<CSV io_type:StringIO encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"">
# ---
#
# Raises an exception if the argument is not a \String object or \IO object:
# # Raises TypeError (no implicit conversion of Symbol into String)
# CSV.open(:foo)
def open(filename, mode="r", **options)
def open(filename_or_io, mode="r", **options)
# wrap a File opened with the remaining +args+ with no newline
# decorator
file_opts = {}
may_enable_bom_deletection_automatically(mode, options, file_opts)
may_enable_bom_detection_automatically(filename_or_io,
mode,
options,
file_opts)
file_opts.merge!(options)
unless file_opts.key?(:newline)
file_opts[:universal_newline] ||= false
Expand All @@ -1592,14 +1604,23 @@ def open(filename, mode="r", **options)
options.delete(:replace)
options.delete_if {|k, _| /newline\z/.match?(k)}

begin
f = File.open(filename, mode, **file_opts)
rescue ArgumentError => e
raise unless /needs binmode/.match?(e.message) and mode == "r"
mode = "rb"
file_opts = {encoding: Encoding.default_external}.merge(file_opts)
retry
if filename_or_io.is_a?(StringIO)
f = if RUBY_VERSION >= "2.7"
StringIO.new(filename_or_io.string, **file_opts)
else
StringIO.new(filename_or_io.string)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we always use an empty file_opts for Ruby 2.7 or earlier instead?

diff --git a/lib/csv.rb b/lib/csv.rb
index 5f75894..6f94bd4 100644
--- a/lib/csv.rb
+++ b/lib/csv.rb
@@ -1605,11 +1605,7 @@ class CSV
       options.delete_if {|k, _| /newline\z/.match?(k)}
 
       if filename_or_io.is_a?(StringIO)
-        f = if RUBY_VERSION >= "2.7"
-          StringIO.new(filename_or_io.string, **file_opts)
-        else
-          StringIO.new(filename_or_io.string)
-        end
+        f = StringIO.new(filename_or_io.string, **file_opts)
       else
         begin
           f = File.open(filename_or_io, mode, **file_opts)
@@ -1911,7 +1907,11 @@ class CSV
                                                mode,
                                                options,
                                                file_opts)
-      unless filename_or_io.is_a?(StringIO)
+      if filename_or_io.is_a?(StringIO)
+        # Support to StringIO was dropped for Ruby 2.6 and earlier without BOM support:
+        # https://github.com/ruby/stringio/pull/47
+        return if RUBY_VERSION < "3.0"
+      else
         # "bom|utf-8" may be buggy on Windows:
         # https://bugs.ruby-lang.org/issues/20526
         return if ON_WINDOWS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work because CSV#open still adds keys to file_opts here:

csv/lib/csv.rb

Lines 1586 to 1589 in bb93c28

file_opts.merge!(options)
unless file_opts.key?(:newline)
file_opts[:universal_newline] ||= false
end

For Ruby < 2.7, if file_opts is not empty, we get TypeError: no implicit conversion of Hash into String.

I like that something fails if options is given to CSV#open when it's not supported, as opposed to simply ignoring it, the way I did it, but then the error should be ideally more clear than what the old StringIO returns. I can go that route, by rescuing the TypeError, and raising another exception with a clearer message. In any case, the file_opts[:universal_newline] ||= false would still have to be removed for < 2.7.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work because CSV#open still adds keys to file_opts here:

csv/lib/csv.rb

Lines 1586 to 1589 in bb93c28

file_opts.merge!(options)
unless file_opts.key?(:newline)
file_opts[:universal_newline] ||= false
end

Oh, sorry.

For Ruby < 2.7, if file_opts is not empty, we get TypeError: no implicit conversion of Hash into String.

I like that something fails if options is given to CSV#open when it's not supported, as opposed to simply ignoring it, the way I did it, but then the error should be ideally more clear than what the old StringIO returns. I can go that route, by rescuing the TypeError, and raising another exception with a clearer message.

I agree with you but can we use raise ... unless file_opts.empty? for it instead of rescuing the TypeError?

In any case, the file_opts[:universal_newline] ||= false would still have to be removed for < 2.7.

I agree with you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out passing an empty file_opts doesn't work either. I've noticed this quirk from the interpreter in Ruby 2.6:

StringIO.new("bla", **{}) # => #<StringIO:0x0000563108bb88b8>

opts = {}
StringIO.new("bla", **opts) # => TypeError (no implicit conversion of Hash into String)

To address that, I've changed the StringIO initialization in Ruby < 2.7, so that it:

  • ignores options that are meant only for CSV.new
  • raises an ArgumentError if anything else is passed as options to CSV.open
  • initializes StringIO without keyword arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Let's use the approach.

else
begin
f = File.open(filename_or_io, mode, **file_opts)
rescue ArgumentError => e
raise unless /needs binmode/.match?(e.message) and mode == "r"
mode = "rb"
file_opts = {encoding: Encoding.default_external}.merge(file_opts)
retry
end
end

begin
csv = new(f, **options)
rescue Exception
Expand Down Expand Up @@ -1886,10 +1907,15 @@ def table(path, **options)
private_constant :ON_WINDOWS

private
def may_enable_bom_deletection_automatically(mode, options, file_opts)
# "bom|utf-8" may be buggy on Windows:
# https://bugs.ruby-lang.org/issues/20526
return if ON_WINDOWS
def may_enable_bom_detection_automatically(filename_or_io,
mode,
options,
file_opts)
unless filename_or_io.is_a?(StringIO)
# "bom|utf-8" may be buggy on Windows:
# https://bugs.ruby-lang.org/issues/20526
return if ON_WINDOWS
end
return unless Encoding.default_external == Encoding::UTF_8
return if options.key?(:encoding)
return if options.key?(:external_encoding)
Expand Down
18 changes: 18 additions & 0 deletions test/csv/interface/test_read.rb
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,24 @@ def test_foreach
assert_equal(@rows, rows)
end

def test_foreach_stringio
string_io = StringIO.new(@data)
rows = CSV.foreach(string_io, col_sep: "\t", row_sep: "\r\n").to_a
assert_equal(@rows, rows)
end

def test_foreach_stringio_with_bom
if RUBY_VERSION < "2.7"
# Support to StringIO was dropped for Ruby 2.6 and earlier without BOM support:
# https://github.com/ruby/stringio/pull/47
omit("StringIO's BOM support isn't available with Ruby < 2.7")
end

string_io = StringIO.new("\ufeff#{@data}") # U+FEFF ZERO WIDTH NO-BREAK SPACE
rows = CSV.foreach(string_io, col_sep: "\t", row_sep: "\r\n").to_a
assert_equal(@rows, rows)
end

if respond_to?(:ractor)
ractor
def test_foreach_in_ractor
Expand Down