Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when using priority union with large grammar #134

Open
dowobeha opened this issue Oct 15, 2021 · 4 comments
Open

Segfault when using priority union with large grammar #134

dowobeha opened this issue Oct 15, 2021 · 4 comments

Comments

@dowobeha
Copy link
Contributor

The following causes a segfault. Note that in this case FullLexicalToSurfaceGrammar is rather large, so I suspect that a memory issue may be implicated.

foma[0]: re FullLexicalToSurfaceGrammar .p. GuessToSurfaceGrammar;

Program received signal SIGSEGV, Segmentation fault.
__memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:37
37 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or directory.
(gdb) bt
#0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:37
#1 0x000000000041048d in nhash_insert ()
#2 0x000000000041211d in fsm_determinize ()
#3 0x000000000041d1ca in fsm_minimize ()
#4 0x0000000000427ed4 in fsm_completes ()
#5 0x000000000042b481 in fsm_priority_union_lower ()
#6 0x000000000043f3fe in yyparse ()
#7 0x0000000000437b35 in my_yyparse ()
#8 0x000000000040b2bb in interfacelex ()
#9 0x0000000000401c9b in main ()

@simon-clematide
Copy link

Do you have enough memory on your machine? Sometimes hundreds of GBs are necessary for compilation of large networks.

@mhulden
Copy link
Owner

mhulden commented Oct 26, 2021

You could try to implement the priority union from first principles instead of using the built-in operation. Then minimization gets done in every step and is (sometimes) faster with smaller intermediate results, like so:

regex Grammar | [Guesser .o. ~Grammar.l] ;

Sometimes it's effective to also do some seemingly redundant factorizations in the calculation, as they can also yield smaller intermediate transducers, in particular for the composition step. In particular, I've had some success with this one when other priority unions took too long or ate up memory:

regex Grammar | [Guesser .o. ~[Grammar.l & Guesser.l]] ;

I would probably try this second one first.

Both should produce the same result as:

regex Grammar .p. Guesser ;

@dowobeha
Copy link
Contributor Author

Thanks for the tips. I believe the machine I tried on has 128 GB of RAM.

@mhulden
Copy link
Owner

mhulden commented Oct 27, 2021

If the earlier tip fails, sometimes the best way to solve this kind of memory issue is to break down the priority union in a piecewise fashion for different word lengths. Here's a macro I've used that breaks down the calculation separately for words of length <5,6,7,8,9,10,>10 and unions them all together. Of course, if your language has a different distribution of word lengths, you might want to adapt the below to suit it better.

def PiecewisePriority(Gram, Guess) [[Gram .o. ?^<5] | [Guess .o. ?^<5 .o. ~[?^<5 .o. Guess.l .o. Gram.l]]] |
      [[Gram .o. ?^5] | [Guess .o. ?^5 .o. ~[?^5 .o. Guess.l .o. Gram.l]]]    |
      [[Gram .o. ?^6] | [Guess .o. ?^6 .o. ~[?^6 .o. Guess.l .o. Gram.l]]]    |
      [[Gram .o. ?^7] | [Guess .o. ?^7 .o. ~[?^7 .o. Guess.l .o. Gram.l]]]    |
      [[Gram .o. ?^8] | [Guess .o. ?^8 .o. ~[?^8 .o. Guess.l .o. Gram.l]]]    |
      [[Gram .o. ?^9] | [Guess .o. ?^9 .o. ~[?^9 .o. Guess.l .o. Gram.l]]]    |
      [[Gram .o. ?^10] | [Guess .o. ?^10 .o. ~[?^10 .o. Guess.l .o. Gram.l]]] |
      [[Gram .o. ?^>10] | [Guess .o. ?^>10 .o. ~[?^>10 .o. Guess.l .o. Gram.l]]] ;

With that defined, you can then just issue:

regex PiecewisePriority(FullLexicalToSurfaceGrammar, GuessToSurfaceGrammar);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants