Skip to content

0.7.0: MatchData begin & end

Compare
Choose a tag to compare
@mudge mudge released this 25 Jan 16:27
· 297 commits to main since this release
v0.7.0

Thanks to an issue raised by @driskell about functionality missing from RE2's MatchData (compared to MRI's) in #20, I'm happy to announce version 0.7.0 of re2, now including RE2::MatchData#begin and RE2::MatchData#end for finding the offset of matches in your searches.

The API is the same as the standard library's begin and end:

m = RE2('w(o+)').match('he said woohoo!')
m.begin(0)
# => 8
m.end(0)
# => 11

It also works with RE2's named captures:

m = RE2('w(?P<cheers>o+)').match('he said woohoo!')
m.begin('cheers')
# => 9
m.end(:cheers)
# => 11

Note that on versions of Ruby prior to 1.9, the offset will be in bytes while later Ruby versions will return the offset in characters. This is to be consistent with other string functions (such as length and slicing with []) so as to have the least surprising behaviour when dealing with multibyte characters. This is illustrated by the specs for this behaviour which cannot rely on the exact return value of begin and end.

As a technical aside: the trickiest part of implementing this was efficiently calculating the length of the offset string as different implementations of Ruby vary in their string functions. Prior to Ruby 1.9, the offset is calculated using simple pointer arithmetic but other versions will try to use rb_str_sublen when available, falling back to rb_str_length (and incurring the cost of an extra string allocation) on implementations such as Rubinius.

Many thanks to @driskell for originally contributing this and providing invaluable feedback during its development.