-
Notifications
You must be signed in to change notification settings - Fork 122
Allow CSV.open with StringIO argument #302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lib/csv.rb
Outdated
| pos = filename_or_io.pos | ||
| f = StringIO.new(filename_or_io.read) | ||
| filename_or_io.seek(pos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use StringIO#string here?
| pos = filename_or_io.pos | |
| f = StringIO.new(filename_or_io.read) | |
| filename_or_io.seek(pos) | |
| f = StringIO.new(filename_or_io.string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd then run into an issue similar to #98, which I confirmed with another test:
def test_foreach_stringio_with_bom
s = StringIO.new
s << "\ufeff" # BOM
s << @data
s.rewind
s.read(3)
rows = CSV.foreach(s, col_sep: "\t", row_sep: "\r\n").to_a
assert_equal(@rows, rows)
endThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the most relevant counter-argument is, CSV.new also uses read, so using read here is also a matter of consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Could you also add a test for the BOM case?
How about using StringIO#set_encoding_by_bom for it?
f = StringIO.new(filename_or_io.string)
unless f.set_encoding_by_bom
f.set_encoding(filename_or_io.external_encoding)
endCSV.new uses already opened IO. So it must not ignore the current position. So we must use read.
But CSV.open opens a new IO. So it must use all contents instead of contents after the current position.
Could you check the current behavior of CSV.open with sought real IO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But CSV.open opens a new IO. So it must use all contents instead of contents after the current position.
You're right, CSV.open reads the whole file content. I changed open again to use string instead of read.
Could you also add a test for the BOM case?
Sure thing. I added it, and together with using the **file_opts in the new StringIO, it's handled correctly.
How about using StringIO#set_encoding_by_bom for it?
I couldn't see any effect of doing it. It seems like including the file_opts is enough. Or is there a case I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't see any effect of doing it.
Ah, b706d91 is a trick for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything you still feel is missing, after I've added file_opts with the trick from b706d91 ?
|
Could you rebase on main? Our CI on master broken... and I've fixed it now. |
Thank you, I've rebase it now. |
|
Thanks. But sorry... The master CI was broken... I've fixed it. Could you rebase on master again? |
No problem, I rebased it now. |
|
I'm assuming the test failures are related to https://bugs.ruby-lang.org/issues/20526, right? |
|
Hmm. It should be fixed by 47bf76f . |
|
Rebased it, and removed a now unneeded check when closing the IO within |
|
Ah, the failed test is an added test. Does this work? diff --git a/lib/csv.rb b/lib/csv.rb
index 7511ced..d2d1c57 100644
--- a/lib/csv.rb
+++ b/lib/csv.rb
@@ -1591,7 +1591,10 @@ class CSV
# wrap a File opened with the remaining +args+ with no newline
# decorator
file_opts = {}
- may_enable_bom_deletection_automatically(mode, options, file_opts)
+ may_enable_bom_deletection_automatically(filename_or_io,
+ mode,
+ options,
+ file_opts)
file_opts.merge!(options)
unless file_opts.key?(:newline)
file_opts[:universal_newline] ||= false
@@ -1900,10 +1903,15 @@ class CSV
private_constant :ON_WINDOWS
private
- def may_enable_bom_deletection_automatically(mode, options, file_opts)
- # "bom|utf-8" may be buggy on Windows:
- # https://bugs.ruby-lang.org/issues/20526
- return if ON_WINDOWS
+ def may_enable_bom_deletection_automatically(filename_or_io,
+ mode,
+ options,
+ file_opts)
+ unless filename_or_io.is_a?(StringIO)
+ # "bom|utf-8" may be buggy on Windows:
+ # https://bugs.ruby-lang.org/issues/20526
+ return if ON_WINDOWS
+ end
return unless Encoding.default_external == Encoding::UTF_8
return if options.key?(:encoding)
return if options.key?(:external_encoding) |
|
Sorry, I was away for a while. I saw that support to |
…nstead of reusing it
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
This check had been added before we started copying the StringIO argument passed to `#open`, as part of instantiating CSV. Right now we can call `csv.close` even if the underlying IO is a StringIO, without any unexpected side effect.
test/csv/interface/test_read.rb
Outdated
| if RUBY_VERSION >= "2.7" | ||
| # Support to StringIO was dropped for Ruby 2.6 and earlier: | ||
| # https://github.com/ruby/stringio/pull/47 | ||
| def test_foreach_stringio | ||
| string_io = StringIO.new(@data) | ||
| rows = CSV.foreach(string_io, col_sep: "\t", row_sep: "\r\n").to_a | ||
| assert_equal(@rows, rows) | ||
| end | ||
|
|
||
| def test_foreach_stringio_with_bom | ||
| string_io = StringIO.new("\ufeff#{@data}") # U+FEFF ZERO WIDTH NO-BREAK SPACE | ||
| rows = CSV.foreach(string_io, col_sep: "\t", row_sep: "\r\n").to_a | ||
| assert_equal(@rows, rows) | ||
| end | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does test_foreach_stringio work with Ruby 2.7? If so, could you omit only test_foreach_stringio_with_bom?
| if RUBY_VERSION >= "2.7" | |
| # Support to StringIO was dropped for Ruby 2.6 and earlier: | |
| # https://github.com/ruby/stringio/pull/47 | |
| def test_foreach_stringio | |
| string_io = StringIO.new(@data) | |
| rows = CSV.foreach(string_io, col_sep: "\t", row_sep: "\r\n").to_a | |
| assert_equal(@rows, rows) | |
| end | |
| def test_foreach_stringio_with_bom | |
| string_io = StringIO.new("\ufeff#{@data}") # U+FEFF ZERO WIDTH NO-BREAK SPACE | |
| rows = CSV.foreach(string_io, col_sep: "\t", row_sep: "\r\n").to_a | |
| assert_equal(@rows, rows) | |
| end | |
| end | |
| def test_foreach_stringio | |
| string_io = StringIO.new(@data) | |
| rows = CSV.foreach(string_io, col_sep: "\t", row_sep: "\r\n").to_a | |
| assert_equal(@rows, rows) | |
| end | |
| def test_foreach_stringio_with_bom | |
| if RUBY_VERSION < "2.7" | |
| # Support to StringIO was dropped for Ruby 2.6 and earlier without BOM support: | |
| # https://github.com/ruby/stringio/pull/47 | |
| omit("StringIO's BOM support isn't available with Ruby < 2.7") | |
| end | |
| string_io = StringIO.new("\ufeff#{@data}") # U+FEFF ZERO WIDTH NO-BREAK SPACE | |
| rows = CSV.foreach(string_io, col_sep: "\t", row_sep: "\r\n").to_a | |
| assert_equal(@rows, rows) | |
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does if I also initialize StringIO without additional options. I just pushed another commit like that. It's unfortunate having to run an additional RUBY_VERSION check at runtime, but at least that's only when calling open.
lib/csv.rb
Outdated
| f = if RUBY_VERSION >= "2.7" | ||
| StringIO.new(filename_or_io.string, **file_opts) | ||
| else | ||
| StringIO.new(filename_or_io.string) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we always use an empty file_opts for Ruby 2.7 or earlier instead?
diff --git a/lib/csv.rb b/lib/csv.rb
index 5f75894..6f94bd4 100644
--- a/lib/csv.rb
+++ b/lib/csv.rb
@@ -1605,11 +1605,7 @@ class CSV
options.delete_if {|k, _| /newline\z/.match?(k)}
if filename_or_io.is_a?(StringIO)
- f = if RUBY_VERSION >= "2.7"
- StringIO.new(filename_or_io.string, **file_opts)
- else
- StringIO.new(filename_or_io.string)
- end
+ f = StringIO.new(filename_or_io.string, **file_opts)
else
begin
f = File.open(filename_or_io, mode, **file_opts)
@@ -1911,7 +1907,11 @@ class CSV
mode,
options,
file_opts)
- unless filename_or_io.is_a?(StringIO)
+ if filename_or_io.is_a?(StringIO)
+ # Support to StringIO was dropped for Ruby 2.6 and earlier without BOM support:
+ # https://github.com/ruby/stringio/pull/47
+ return if RUBY_VERSION < "3.0"
+ else
# "bom|utf-8" may be buggy on Windows:
# https://bugs.ruby-lang.org/issues/20526
return if ON_WINDOWSThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work because CSV#open still adds keys to file_opts here:
Lines 1586 to 1589 in bb93c28
| file_opts.merge!(options) | |
| unless file_opts.key?(:newline) | |
| file_opts[:universal_newline] ||= false | |
| end |
For Ruby < 2.7, if file_opts is not empty, we get TypeError: no implicit conversion of Hash into String.
I like that something fails if options is given to CSV#open when it's not supported, as opposed to simply ignoring it, the way I did it, but then the error should be ideally more clear than what the old StringIO returns. I can go that route, by rescuing the TypeError, and raising another exception with a clearer message. In any case, the file_opts[:universal_newline] ||= false would still have to be removed for < 2.7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work because
CSV#openstill adds keys tofile_optshere:Lines 1586 to 1589 in bb93c28
file_opts.merge!(options) unless file_opts.key?(:newline) file_opts[:universal_newline] ||= false end
Oh, sorry.
For Ruby < 2.7, if
file_optsis not empty, we getTypeError: no implicit conversion of Hash into String.I like that something fails if
optionsis given toCSV#openwhen it's not supported, as opposed to simply ignoring it, the way I did it, but then the error should be ideally more clear than what the oldStringIOreturns. I can go that route, by rescuing theTypeError, and raising another exception with a clearer message.
I agree with you but can we use raise ... unless file_opts.empty? for it instead of rescuing the TypeError?
In any case, the
file_opts[:universal_newline] ||= falsewould still have to be removed for< 2.7.
I agree with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out passing an empty file_opts doesn't work either. I've noticed this quirk from the interpreter in Ruby 2.6:
StringIO.new("bla", **{}) # => #<StringIO:0x0000563108bb88b8>
opts = {}
StringIO.new("bla", **opts) # => TypeError (no implicit conversion of Hash into String)To address that, I've changed the StringIO initialization in Ruby < 2.7, so that it:
- ignores options that are meant only for
CSV.new - raises an
ArgumentErrorif anything else is passed asoptionstoCSV.open - initializes
StringIOwithout keyword arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Let's use the approach.
|
Thanks. |
Fix #300