Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problems with GNFA.to_regex #122

Closed
colette-b opened this issue Jan 8, 2023 · 11 comments
Closed

problems with GNFA.to_regex #122

colette-b opened this issue Jan 8, 2023 · 11 comments
Labels

Comments

@colette-b
Copy link

GNFA.to_regex seems to give wrong results sometimes. For example, the following code:

from automata.fa.dfa import DFA
from automata.fa.nfa import NFA
from automata.fa.gnfa import GNFA

nfa = NFA.from_regex('a*')
dfa = DFA.from_nfa(nfa)
gnfa = GNFA.from_dfa(dfa)
print(gnfa.to_regex())

prints a single-character string *, which is not a valid regex.

@caleb531 caleb531 added the bug label Jan 8, 2023
@caleb531
Copy link
Owner

caleb531 commented Jan 8, 2023

@colette-b Thank you for the report! I can reproduce the issue on my end, as well.

automata py3 main ❯ python3
Python 3.11.0 (main, Oct 26 2022, 19:06:18) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from automata.fa.dfa import DFA
>>> from automata.fa.nfa import NFA
>>> from automata.fa.gnfa import GNFA
>>> 
>>> nfa = NFA.from_regex('a*')
>>> dfa = DFA.from_nfa(nfa)
>>> gnfa = GNFA.from_dfa(dfa)
>>> print(gnfa.to_regex())
*

@abhinavsinha-adrino I believe the GNFA class was your addition to the library. Do you have any thoughts on this?

cc @eliotwrobson (in case you have anything to add)

@eliotwrobson
Copy link
Collaborator

eliotwrobson commented Jan 8, 2023

@caleb531 I think it was @abhinavsinha-adrino that added GNFA (in PR #46). I'm not super familiar with the way that the GNFA works; though I did do some light refactoring of the to_regex() function to change it to use iteration (in #74), I don't think that would have introduced this bug. Might be worth double checking with v6 of the library.

I played with the example code and it looks like the dfa produced correctly recognizes the a* regex, so the issue is definitely in the GNFA class somewhere.

@caleb531
Copy link
Owner

caleb531 commented Jan 8, 2023

@eliotwrobson Oh yes, my apologies. Unfortunately, it does appear that this bug was introduced in the v7 release, because the bug is not present in v6:

automata py3 main ❯ git checkout v6.0.0
HEAD is now at bb6d5f9 Prepare v6.0.0 release
automata py3 HEAD ❯ python3
Python 3.11.0 (main, Oct 26 2022, 19:06:18) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from automata.fa.dfa import DFA
>>> from automata.fa.nfa import NFA
>>> from automata.fa.gnfa import GNFA
>>> 
>>> nfa = NFA.from_regex('a*')
>>> dfa = DFA.from_nfa(nfa)
>>> gnfa = GNFA.from_dfa(dfa)
>>> print(gnfa.to_regex())
(aa*)?

Here is a link to the diff between v6 and v7, for reference:
v6.0.0...v7.0.0#diff-f507a0a5351199af63295732c015178c1cd835aca088126d724a1f0f68a6b566

@eliotwrobson
Copy link
Collaborator

eliotwrobson commented Jan 8, 2023

Ah then that is my bad then 😅 I'll add this to the test suite and see if I can whip up a fix really quickly.

EDIT: it seems like this issue might not come up if you try converting directly from the NFA to the GNFA (no intermediate DFA)? Will play with it more to be sure but kinda weird.

@abhinavsinha-adrino
Copy link
Contributor

@eliotwrobson I think I found one of the problems (hopefully the only one 🥲, I haven't run anything yet)
Line 230 in gnfa.py (v7)
r2 = f'{r1}*'
in v6 this was:
r2 = '{}*'.format(r2)

This must have been generating errors with many other regular expressions.

@eliotwrobson
Copy link
Collaborator

@abhinavsinha-adrino ahhh yeah that would do it, thank you! Let me change that and see if it works. One of the other issues is in the code below:

automata/tests/test_gnfa.py

Lines 251 to 259 in f7acee8

nfa = NFA.from_regex('a(aaa*bbcd|abbcd)d*|aa*bb(dcc*|(d|c)b|a?bb(dcc*|(d|c)))ab(c|d)*(ccd)?')
gnfa = GNFA.from_nfa(nfa)
regex = gnfa.to_regex()
nfa = NFA.from_regex(regex)
dfa2 = DFA.from_nfa(nfa)
dfa = DFA.from_nfa(nfa)
self.assertEqual(dfa, dfa2)

This test will never raise an error, which is probably why we didn't catch this bug earlier.

@abhinavsinha-adrino
Copy link
Contributor

@eliotwrobson I did a lousy job in testing the code. This was my first open-source contribution, and I'm not a programmer, so I kind of hated the testing part 😖.

Maybe some more tests need to be added. I guess we can simply add 'a*' test case. Is it possible for you to add this with the fix?

@eliotwrobson
Copy link
Collaborator

Maybe some more tests need to be added. I guess we can simply add 'a*' test case. Is it possible for you to add this with the fix?

@abhinavsinha-adrino take a look at the PR I just put up (#123), it adds this along with the ability to easily add arbitrary regex to the test suite.

@abhinavsinha-adrino
Copy link
Contributor

@eliotwrobson 👍, this fixes the issue.

@caleb531
Copy link
Owner

caleb531 commented Jan 8, 2023

Excellent! I have merged that PR, and am presently pushing a Automata v7.1.0 release containing this fix (as well as the other recent additions, like the ^ regex operator).

@caleb531
Copy link
Owner

caleb531 commented Jan 8, 2023

@colette-b Alright, v7.1.0 is released with this fix. Thank you for reporting this bug!

https://github.com/caleb531/automata/releases/tag/v7.1.0

I will be closing this issue now, but please let me know if you run into any issues.

@caleb531 caleb531 closed this as completed Jan 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants