Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested regexes are bad and result in ambiguous matches #663

Merged
merged 1 commit into from
Mar 13, 2015

Conversation

rwz
Copy link
Contributor

@rwz rwz commented Apr 5, 2014

Imagine I'm writing a new scenario that includes something similar to this:

When I press "Submit" button
Then I should see "can't be blank" validation error on "Email"

first time I run this new scenario, cucumber will suggest me code snippets to start implementing missing steps. The snippet for validation step would be

Then(/^I should see "(.*?)" validation error on "(.*?)"$/) do |arg1, arg2|
  pending # express the regexp above with the code you wish you had
end

Later I add another scenario that includes this:

Then I should see "foobar"

and cucumber, again, helpfully suggest a snippet to start implementing the step:

Then(/^I should see "(.*?)"$/) do |arg1|

The problem is, the suggested regexp also matches everything, a previously defined validation step does resulting with lot of previously green steps suddenly go red with Cucumber::Ambiguous error.

@rwz
Copy link
Contributor Author

rwz commented Apr 5, 2014

Simplest solution would be to suggest a more robust regexp that does't allow quotes inside quoted value:

Then /^I should see "([^"]*)"$/ do |arg1|

or even better:

Then /^I should see "((?:\\"|[^"])*)"$/ do |arg1|

The second option allows escaped quotes inside a quoted value, like this:

Then I should see "press \"proceed\" to continue"

I'd personally prefer the latter, but I could see how such regexp might confuse a lot of people not used to seeing things like this.

@mattwynne
Copy link
Member

See #237 for background on this.

TBH I'd prefer not to change this back at the moment, as we have bigger fish to fry with the 2.0 release. When we're done with that I'd like to concentrate on bringing in Turnip-style placeholders which hide the regexp altogether.

@aslakhellesoy
Copy link
Contributor

I think turnip-style patterns would be a step in the right direction for all cucumber implementations. Behat 3 already supports them.

Does it make sense to create a small turnip-pattern library for the various programming languages? /cc @jnicklas @everzet @gasparnagy @jbpros

@mattwynne mattwynne added this to the 2.1 milestone Nov 11, 2014
@mattwynne mattwynne added the 🚦 needs tests Needs tests label Nov 11, 2014
@mattwynne
Copy link
Member

I'm going to close this PR. I do appreciate it, thanks, but I'd prefer to keep the generated snippets non-intimidating for the 80/20 use-case. I hope we can use your more robust ones when #708 is implemented.

Any objections?

@mattwynne mattwynne closed this Mar 1, 2015
@aslakhellesoy
Copy link
Contributor

@mattwynne are you saying ([^"]*) is more intimidating than (.*?)? (I would disagree).

Cucumber suggested ([^"]*) in snippets for many years until we accepted a poor patch that switched it to (.*?) - a regression IMO.

I think we should switch it back - it should be easy to do.

@aslakhellesoy
Copy link
Contributor

Let me elaborate on why I disagree that ([^"]*) is more intimidating than (.*?) and why I think this PR should be applied.

Let's assume we have two kinds of users - those who know enough regular expressions to understand both styles, and those who don't. There might also be some people who understand one style, but not the other, but that is probably a very small percentage, so let's ignore them for the sake of this argument.

The users who do understand regular expressions are unlikely to find either style more intimidating than the other.

The users who don't understand regular expressions are likely to find both styles equally intimidating.

Of course, I'm only hypothesising here, but so are you @mattwynne. We don't have data to support any of these hypotheses.

Since we don't have data to support either hypothesis we should decide what style to use based on which one is most correct and causes the least surprise during use. I would argue that even if we did have data to support the claim that people find ([^"]*) more intimidating than (.*?), we should still use ([^"]*) because it is more correct.

@rwz has already made a good case that the (.*?) can cause surprising behaviour while the original ([^"]*) will not.

@rwz
Copy link
Contributor Author

rwz commented Mar 1, 2015

I'd say that .*? is arguably more confusing than straightforward [^"]*. In my experience a lot of people do not understand the concept of non-greedy ? modifier for *.

I find myself explaining this regexp to others a lot more often than [^"]*.

@mattwynne
Copy link
Member

Oh, the only intimidation I'm thinking about is in the aesthetic. I've observed that people who don't understand regexps at all find the [^"]* form ugly and therefore intimidating. However, I'm convinced by both your points of view, and we'll merge this in for release with 2.0.

@mattwynne mattwynne modified the milestones: 2.0, 2.1 Mar 2, 2015
@mattwynne mattwynne merged commit 00da22e into cucumber:master Mar 13, 2015
mattwynne added a commit that referenced this pull request Mar 13, 2015
@mattwynne
Copy link
Member

Thanks @rwz!

@lock
Copy link

lock bot commented Oct 25, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Oct 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🚦 needs tests Needs tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants