Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(developer): context(n) counts the nul statement as a character in Web & android #13276

Open
2 tasks done
Tracked by #2239
O1Anas opened this issue Feb 18, 2025 · 4 comments
Open
2 tasks done
Tracked by #2239
Assignees
Milestone

Comments

@O1Anas
Copy link

O1Anas commented Feb 18, 2025

I use an arabic phonetic keyboard that I made on my own, It is documented here.
I am making a mobile version of it and I'm facing an issue/bug:

At this line:

nul 'ْ' any(cons) > 'ال' context(2) 'ّ'
  • I typed at the beginning of the line: "es"/"ْس" (ْ س)
  • In the web server/link (KeymanWeb), I recieved: "الّْ" (ا ل ْ ّ) (the context(n) statement in the output probably handled the nul statement in the input as a character/countable block)
  • I expected: "السّ" (ا ل س ّ) (This is what I got in the keyman developer debugger)

On the other hand, at this line:

nul 'ْ' any(cons) > 'ال' context(3) 'ّ'
  • I recieved: "الّ" (ا ل ّ) (Now nul wasn't counted for some reason)

This could be another issue/bug where any() statements aren't saved to get called back with the context(n) statement
This bug is consistent in all other rules, like this one for example:

nul 'ل' any(diacritic) 'ْ' any(cons) > 'ل' context(3) 'ل' context(5) 'ّ'
  • I type: "lies"/"لِْس" (ل ِ ْ س)
  • I recieve: "للّ" (ل ل ّ) (It ignores all context(n) statements when outputting)

I genuinely can't find an explanation or a fix for these issues, please find a fix for them or give me the solution that's already been found :)

Related issues

#12987

& maybe:
#13040

Keyman apps

  • Keyman for Android
  • KeymanWeb

Keyman version

tested on V17.0 & V18.0.193

Language name

Arabic (my own custom keyboard)

Additional context

The behaviour is normal when debugging in keyman developer

@mcdurdin
Copy link
Member

mcdurdin commented Feb 19, 2025

Thank you for this clear and detailed report. I have managed to reproduce this issue with Keyman Developer 18.0.190-beta (will be same as .193) and have built a minimal reproduction with the following keyboard.

To reproduce the problem, type es:

On kmx-based platforms, this produces '12b3'.
On web-based platforms, this produces '12a3'.

store(&TARGETS) 'any'

begin Unicode > use(main)

store(cons) 'bcd'

group(main) using keys

+ 'e' > 'a'
+ 's' > 'b'
match > use(post)

group(post)

nul 'a' any(cons) > '12' context(2) '3'

The web compiler produces the following function for the post group in test_13276.js:

  this.g_post_1=function(t,e) {
    var k=KeymanWeb,r=1,m=0;
      if(k.KN(2,t)&&k.KCM(2,t,"a",1)&&k.KA(1,k.KC(1,1,t),this.s_cons_4)){
        m=1;   // Line 15
        k.KO(2,t,"12");
        k.KO(-1,t,"a3");
      }
    return r;
  };

KMX compiler produces the following for that rule:

context: UC_SENTINEL CODE_NUL[0004] U+0061 UC_SENTINEL CODE_ANY[0001] U+0005
output: U+0031 U+0032 UC_SENTINEL CODE_CONTEXTEX[0011] U+0002 U+0033

We can see how the compiler has selected "a" as the character to output, rather than referencing the input character from the cons store:

        k.KO(-1,t,"a3");

This is definitely not consistent between the platforms. Unfortunately, the documentation for context() is not clear on the correct behaviour (and this discrepancy is made even more unclear by the index differences in index() vs context() as documented).

Generally, in this situation, we select the .kmx behaviour to be the intended behaviour. We'll update the documentation accordingly. This behaviour looks to have been in place since 2003 for .kmx:

// 11 Aug 2003 - I25(v6) - mcdurdin - CODE_NUL context support
if(*kkp->dpContext == UC_SENTINEL && *(kkp->dpContext+1) == CODE_NUL)
u16cpy(m_miniContext, /*GLOBAL_ContextStackSize,*/ m_context.Buf(xstrlen_ignoreifopt(kkp->dpContext)-1) /*, GLOBAL_ContextStackSize*/); // I3162 // I3536
else
u16cpy(m_miniContext, /*GLOBAL_ContextStackSize,*/ m_context.Buf(xstrlen_ignoreifopt(kkp->dpContext)) /*, GLOBAL_ContextStackSize*/); // I3162 // I3536

I need to do some more investigation on the history of this discrepancy:

@mcdurdin mcdurdin added this to the B18S2 milestone Feb 19, 2025
@mcdurdin mcdurdin moved this to Todo in Keyman Feb 19, 2025
@mcdurdin
Copy link
Member

This is a longstanding bug. I tested with 15.0.274 and got the same results.

The problem is not impacted by #13003, although I am surprised that I didn't pick this up when I was addressing that bug.

  • are there any keyboards in the Keyman keyboards repository which are impacted by this?

regex search: ^\s*nul(.|\\\n)+context\(

No published keyboards in the repository impacted by this bug. No test keyboards in this repository that test this bug. So it looks novel and we should be able to address this without significant ramifications.

@mcdurdin mcdurdin changed the title bug(web): Context(n) counts the nul statement as a character in Web & android bug(developer): Context(n) counts the nul statement as a character in Web & android Feb 20, 2025
@mcdurdin mcdurdin changed the title bug(developer): Context(n) counts the nul statement as a character in Web & android bug(developer): context(n) counts the nul statement as a character in Web & android Feb 20, 2025
@mcdurdin
Copy link
Member

The context(n) rule does not include nul in kmx (unlike if()). This consistency issue is a problem in the language (@markcsinclair FYI).

nul 'a' any(cons) > '12' context(2) '3'   c TYPE: es (nul context req)
'x' 'a' any(cons) > '45' context(3) '6'   c TYPE: xes

Note that usage of if() ... context() appears to be wrong in every single instance in the keyboards repository (regex search: \b(nul|if\().+context\():

13 results - 9 files

release\a\armenian_mnemonic\source\armenian_mnemonic.kmn:
  1183: if(option_key_t_letter_tyun != '1') dk(small_to) notany(t_base_keys) > U+0569 context(3)
  1186: if(option_key_t_letter_tyun != '1') dk(capital_to) notany(t_base_keys) > U+0539 context(3)
  1189: if(option_key_t_letter_tyun = '1') dk(small_tyun) notany(t_base_keys) > U+057F context(3)
  1192: if(option_key_t_letter_tyun = '1') dk(capital_tyun) notany(t_base_keys) > U+054F context(3)

release\gff\gff_awngi_xamtanga\source\gff_awngi_xamtanga.kmn:
  236:   if( &layer = "ዐ-layer" ) 'ዐ' + any(touchVowels) > context(2) index(ዐ_ቤተሰብ,3)

release\gff\gff_blin\source\gff_blin.kmn:
  239:   if( &layer = "ዐ-layer" ) 'ዐ' + any(touchVowels) > context(2) index(ዐ_ቤተሰብ,3)

release\gff\gff_ethiopic\source\gff_ethiopic.kmn:
  333:   if( &layer = "ዐ-layer" ) 'ዐ' + any(touchVowels) > context(2) index(ዐ_ቤተሰብ,3)

release\gff\gff_gurage\source\gff_gurage.kmn:
  241:   if( &layer = "ዐ-extra" ) 'ዐ' + any(touchVowels) > context(2) index(ዐ_ቤተሰብ,3)
  282:   if( &layer = "ዐ-layer" ) 'ዐ' + any(touchVowels) > context(2) index(ዐ_ቤተሰብ,3)

release\gff\gff_harari\source\gff_harari.kmn:
  272:   if( &layer = "ዐ-layer" ) 'ዐ' + any(touchVowels) > context(2) index(ዐ_ቤተሰብ,3)

release\gff\gff_tigre\source\gff_tigre.kmn:
  238:   if( &layer = "ዐ-layer" ) 'ዐ' + any(touchVowels) > context(2) index(ዐ_ቤተሰብ,3)

release\gff\gff_tigrinya_eritrea\source\gff_tigrinya_eritrea.kmn:
  240:   if( &layer = "ዐ-layer" ) 'ዐ' + any(touchVowels) > context(2) index(ዐ_ቤተሰብ,3)

release\gff\gff_tigrinya_ethiopia\source\gff_tigrinya_ethiopia.kmn:
  240:   if( &layer = "ዐ-layer" ) 'ዐ' + any(touchVowels) > context(2) index(ዐ_ቤተሰብ,3)

This suggests that perhaps it would be better to bring context(n) in line with index() -- as the existing keyboards will not be behaving correctly anyway. It would be a cleaner long-term outcome for keyboard authors as the current discrepancy is pretty hideous.

@mcdurdin
Copy link
Member

mcdurdin commented Feb 20, 2025

I've completed some further analysis on this bug and it reinforces my previous comment -- it would be better to move towards implementing a fix in Keyman Core for this specific case of nul + context(n).

We have the following possible cases with context(n). Looking at the compiler outputs and the implementations in Keyman Core and KeymanWeb, it actually starts to look like the Keyman Core implementation and the documentation for context(n) are the outliers.

This is the test keyboard in question:

store(&TARGETS) 'any'

begin Unicode > use(main)

store(cons) 'mn'
store(ifx) '1'

group(main) using keys

c nul any(cons) + 'a' > 'x' context(1) 'z'

any(cons) + 'a' > '1' context(1)
nul any(cons) + 'b' > '2' context(2)    c <-- failing in debugger, 18.0.190-beta (and in Windows)
'x' any(cons) + 'c' > '3' context(2)
if(ifx = '1') any(cons) + 'd' > '4' context(2)

any > context(1)

any(cons) + 'a' > '1' context(1)
  • context: UC_SENTINEL CODE_ANY[0001] U+0005
  • output: U+0031 UC_SENTINEL CODE_CONTEXTEX[0011] U+0001
  if(k.KA(0,k.KC(1,1,t),this.s_cons_4)){
    r=m=1;   // Line 12
    k.KO(1,t,"1");
    k.KIO(-1,this.s_cons_4,1,t);
  }

nul any > context(2)

nul any(cons) + 'b' > '2' context(2)   
  • context: UC_SENTINEL CODE_NUL[0004] UC_SENTINEL CODE_ANY[0001] U+0005
  • output: U+0032 UC_SENTINEL CODE_CONTEXTEX[0011] U+0002

With v9 kmw compiler mode (KN, KA, etc):

  if(k.KN(1,t)&&k.KA(0,k.KC(1,1,t),this.s_cons_4)){
    r=m=1;   // Line 13
    k.KO(1,t,"2");
    k.KIO(-1,this.s_cons_4,1,t);
  }

Note how k.kIO() has an offset parameter == 1 (so kmw implementation ignores nul but keyboard doesn't need to take it into acccount).

With v10 kmw compiler mode (KFCM, e.g. store(&version) '17.0'), unfortunately, this is incorrect:

  if(k.KFCM(2,t,[{t:'n'},{t:'a',a:this.s_cons_5}])){
    r=m=1;   // Line 14
    k.KDC(1,t);
    k.KO(-1,t,"2");
    k.KIO(-1,this.s_cons_5,1,t);
  }

Note: line number and store number differences are not relevant (due to additional store(&version)).

Note that k.KIO() has an offset parameter == 1, but in this case, KMW is adjusting for nul and so k.KIO does not emit anything, and the console warning Unmatched contextual index() statement detected in rule with index 1! is written to the console log.

char any > context(2)

'x' any(cons) + 'c' > '3' context(2)
  • context: U+0078 UC_SENTINEL CODE_ANY[0001] U+0005
  • output: U+0033 UC_SENTINEL CODE_CONTEXTEX[0011] U+0002
  if(k.KCM(2,t,"x",1)&&k.KA(1,k.KC(1,1,t),this.s_cons_4)){
    r=m=1;   // Line 14
    k.KO(2,t,"3");
    k.KIO(-1,this.s_cons_4,2,t);
  }

Note how k.kIO() has an offset parameter == 2.

if any > context(2)

if(ifx = '1') any(cons) + 'd' > '4' context(2)
  • context: UC_SENTINEL CODE_IFOPT[0014] U+0006 U+0002 U+0009 UC_SENTINEL CODE_ANY[0001] U+0005
  • output: U+0034 UC_SENTINEL CODE_CONTEXTEX[0011] U+0002
  if(this.s_ifx_5===this.s8&&k.KA(0,k.KC(1,1,t),this.s_cons_4)){
    r=m=1;   // Line 15
    k.KO(1,t,"4");
    k.KIO(-1,this.s_cons_4,1,t);
  }

Note how k.kIO() has an offset parameter == 1 (so kmw implementation ignores if() but keyboard doesn't need to take it into acccount)

Summary

If we move away from context(n) having a different n value than index(x,n) for if and nul examples, then:

  • .kmx compiler emits consistent offsets.
  • .js compiler handles nul and if() correctly for the KMW engine, meaning web works correctly too.
  • Which means, that it's Keyman Core's implementation of this nul+context(n) scenario that is incorrect
  • The documentation for context(n) is also wrong: if() ... > context(n) should be the same as if() ... > index(store,n)

Resolution

See:

Workarounds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

2 participants