Segfault when using priority union with large grammar #134

dowobeha · 2021-10-15T21:46:50Z

The following causes a segfault. Note that in this case FullLexicalToSurfaceGrammar is rather large, so I suspect that a memory issue may be implicated.

foma[0]: re FullLexicalToSurfaceGrammar .p. GuessToSurfaceGrammar;

Program received signal SIGSEGV, Segmentation fault.
__memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:37
37 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or directory.
(gdb) bt
#0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:37
#1 0x000000000041048d in nhash_insert ()
#2 0x000000000041211d in fsm_determinize ()
#3 0x000000000041d1ca in fsm_minimize ()
#4 0x0000000000427ed4 in fsm_completes ()
#5 0x000000000042b481 in fsm_priority_union_lower ()
#6 0x000000000043f3fe in yyparse ()
#7 0x0000000000437b35 in my_yyparse ()
#8 0x000000000040b2bb in interfacelex ()
#9 0x0000000000401c9b in main ()

simon-clematide · 2021-10-26T13:08:32Z

Do you have enough memory on your machine? Sometimes hundreds of GBs are necessary for compilation of large networks.

mhulden · 2021-10-26T18:24:05Z

You could try to implement the priority union from first principles instead of using the built-in operation. Then minimization gets done in every step and is (sometimes) faster with smaller intermediate results, like so:

regex Grammar | [Guesser .o. ~Grammar.l] ;

Sometimes it's effective to also do some seemingly redundant factorizations in the calculation, as they can also yield smaller intermediate transducers, in particular for the composition step. In particular, I've had some success with this one when other priority unions took too long or ate up memory:

regex Grammar | [Guesser .o. ~[Grammar.l & Guesser.l]] ;

I would probably try this second one first.

Both should produce the same result as:

regex Grammar .p. Guesser ;

dowobeha · 2021-10-26T18:45:35Z

Thanks for the tips. I believe the machine I tried on has 128 GB of RAM.

mhulden · 2021-10-27T10:38:00Z

If the earlier tip fails, sometimes the best way to solve this kind of memory issue is to break down the priority union in a piecewise fashion for different word lengths. Here's a macro I've used that breaks down the calculation separately for words of length <5,6,7,8,9,10,>10 and unions them all together. Of course, if your language has a different distribution of word lengths, you might want to adapt the below to suit it better.

def PiecewisePriority(Gram, Guess) [[Gram .o. ?^<5] | [Guess .o. ?^<5 .o. ~[?^<5 .o. Guess.l .o. Gram.l]]] |
      [[Gram .o. ?^5] | [Guess .o. ?^5 .o. ~[?^5 .o. Guess.l .o. Gram.l]]]    |
      [[Gram .o. ?^6] | [Guess .o. ?^6 .o. ~[?^6 .o. Guess.l .o. Gram.l]]]    |
      [[Gram .o. ?^7] | [Guess .o. ?^7 .o. ~[?^7 .o. Guess.l .o. Gram.l]]]    |
      [[Gram .o. ?^8] | [Guess .o. ?^8 .o. ~[?^8 .o. Guess.l .o. Gram.l]]]    |
      [[Gram .o. ?^9] | [Guess .o. ?^9 .o. ~[?^9 .o. Guess.l .o. Gram.l]]]    |
      [[Gram .o. ?^10] | [Guess .o. ?^10 .o. ~[?^10 .o. Guess.l .o. Gram.l]]] |
      [[Gram .o. ?^>10] | [Guess .o. ?^>10 .o. ~[?^>10 .o. Guess.l .o. Gram.l]]] ;

With that defined, you can then just issue:

regex PiecewisePriority(FullLexicalToSurfaceGrammar, GuessToSurfaceGrammar);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault when using priority union with large grammar #134

Segfault when using priority union with large grammar #134

dowobeha commented Oct 15, 2021

simon-clematide commented Oct 26, 2021

mhulden commented Oct 26, 2021

dowobeha commented Oct 26, 2021

mhulden commented Oct 27, 2021

Segfault when using priority union with large grammar #134

Segfault when using priority union with large grammar #134

Comments

dowobeha commented Oct 15, 2021

simon-clematide commented Oct 26, 2021

mhulden commented Oct 26, 2021

dowobeha commented Oct 26, 2021

mhulden commented Oct 27, 2021