Skip to content

0.6.0: Introducing RE2::Scanner

Compare
Choose a tag to compare
@mudge mudge released this 01 Feb 23:48

Scanning

Thanks to a suggestion from Matthias Kadenbach, re2 now contains an API for incrementally scanning a string for matches. To use it, call scan on an instance of RE2::Regexp with the string you want to search:

scanner = RE2('(\d+)').scan("Some 1 long 23 string 4 containing 567 numbers")
scanner.scan #=> ["1"]
scanner.scan #= ["23"]

The scanner in the example above is an instance of RE2::Scanner which has one main method -- scan -- which returns the next match. Once no more matches are found, scan will return nil. You can use rewind to reset a scanner back to the beginning of the string.

The RE2::Scanner class also implements Ruby's Enumerator interface so you can call each and to_enum on it:

scanner = RE2('(\d+)').scan("Some 1 long 23 string 4 containing 567 numbers")
scanner.each do |match|
  puts match
end

No more in-place replacement

This release removes methods that previously altered strings in-place. This means re2_sub! and re2_gsub! are gone and RE2.Replace and RE2.GlobalReplace now return new strings rather than modifying their input.

Encoding awareness

Again, thanks to a bug report by Matthias Kadenbach: in Ruby 1.9 and later, re2 will now set the correct encoding for strings.

m = RE2('(\w+)', :utf8 => true).match("foo")
m[1].encoding # => #<Encoding:UTF-8>