bug(developer): `context(n)` counts the `nul` statement as a character in Web & android #13276

O1Anas · 2025-02-18T18:20:35Z

I use an arabic phonetic keyboard that I made on my own, It is documented here.
I am making a mobile version of it and I'm facing an issue/bug:

At this line:

nul 'ْ' any(cons) > 'ال' context(2) 'ّ'

I typed at the beginning of the line: "es"/"ْس" (ْ س)
In the web server/link (KeymanWeb), I recieved: "الّْ" (ا ل ْ ّ) (the context(n) statement in the output probably handled the nul statement in the input as a character/countable block)
I expected: "السّ" (ا ل س ّ) (This is what I got in the keyman developer debugger)

On the other hand, at this line:

nul 'ْ' any(cons) > 'ال' context(3) 'ّ'

I recieved: "الّ" (ا ل ّ) (Now nul wasn't counted for some reason)

This could be another issue/bug where any() statements aren't saved to get called back with the context(n) statement
This bug is consistent in all other rules, like this one for example:

nul 'ل' any(diacritic) 'ْ' any(cons) > 'ل' context(3) 'ل' context(5) 'ّ'

I type: "lies"/"لِْس" (ل ِ ْ س)
I recieve: "للّ" (ل ل ّ) (It ignores all context(n) statements when outputting)

I genuinely can't find an explanation or a fix for these issues, please find a fix for them or give me the solution that's already been found :)

Related issues

#12987

& maybe:
#13040

Keyman apps

Keyman for Android
KeymanWeb

Keyman version

tested on V17.0 & V18.0.193

Language name

Arabic (my own custom keyboard)

Additional context

The behaviour is normal when debugging in keyman developer

The text was updated successfully, but these errors were encountered:

mcdurdin · 2025-02-19T08:52:26Z

Thank you for this clear and detailed report. I have managed to reproduce this issue with Keyman Developer 18.0.190-beta (will be same as .193) and have built a minimal reproduction with the following keyboard.

To reproduce the problem, type es:

On kmx-based platforms, this produces '12b3'.
On web-based platforms, this produces '12a3'.

store(&TARGETS) 'any'

begin Unicode > use(main)

store(cons) 'bcd'

group(main) using keys

+ 'e' > 'a'
+ 's' > 'b'
match > use(post)

group(post)

nul 'a' any(cons) > '12' context(2) '3'

The web compiler produces the following function for the post group in test_13276.js:

  this.g_post_1=function(t,e) {
    var k=KeymanWeb,r=1,m=0;
      if(k.KN(2,t)&&k.KCM(2,t,"a",1)&&k.KA(1,k.KC(1,1,t),this.s_cons_4)){
        m=1;   // Line 15
        k.KO(2,t,"12");
        k.KO(-1,t,"a3");
      }
    return r;
  };

KMX compiler produces the following for that rule:

context: UC_SENTINEL CODE_NUL[0004] U+0061 UC_SENTINEL CODE_ANY[0001] U+0005
output: U+0031 U+0032 UC_SENTINEL CODE_CONTEXTEX[0011] U+0002 U+0033

We can see how the compiler has selected "a" as the character to output, rather than referencing the input character from the cons store:

        k.KO(-1,t,"a3");

This is definitely not consistent between the platforms. Unfortunately, the documentation for context() is not clear on the correct behaviour (and this discrepancy is made even more unclear by the index differences in index() vs context() as documented).

Generally, in this situation, we select the .kmx behaviour to be the intended behaviour. We'll update the documentation accordingly. This behaviour looks to have been in place since 2003 for .kmx:

keyman/core/src/kmx/kmx_processevent.cpp

Lines 340 to 344 in 0958a2d

    
           // 11 Aug 2003 - I25(v6) - mcdurdin - CODE_NUL context support 
        
           if(*kkp->dpContext == UC_SENTINEL && *(kkp->dpContext+1) == CODE_NUL) 
        
             u16cpy(m_miniContext, /*GLOBAL_ContextStackSize,*/ m_context.Buf(xstrlen_ignoreifopt(kkp->dpContext)-1) /*, GLOBAL_ContextStackSize*/);  // I3162   // I3536 
        
           else 
        
             u16cpy(m_miniContext, /*GLOBAL_ContextStackSize,*/ m_context.Buf(xstrlen_ignoreifopt(kkp->dpContext)) /*, GLOBAL_ContextStackSize*/);  // I3162   // I3536

I need to do some more investigation on the history of this discrepancy:

is this new in v17 (rewritten web compiler) or v18 (fix(developer): produce correct index parameter value for context in kmw compiler #13003) or a longstanding discrepancy?
is this impacted by the changes in fix(developer): produce correct index parameter value for context in kmw compiler #13003?
are there any keyboards in the Keyman keyboards repository which are impacted by this?

mcdurdin · 2025-02-19T23:00:57Z

is this new in v17 (rewritten web compiler) or v18 (fix(developer): produce correct index parameter value for context in kmw compiler #13003) or a longstanding discrepancy?

This is a longstanding bug. I tested with 15.0.274 and got the same results.

is this impacted by the changes in fix(developer): produce correct index parameter value for context in kmw compiler #13003?

The problem is not impacted by #13003, although I am surprised that I didn't pick this up when I was addressing that bug.

are there any keyboards in the Keyman keyboards repository which are impacted by this?

regex search: ^\s*nul(.|\\\n)+context\(

No published keyboards in the repository impacted by this bug. No test keyboards in this repository that test this bug. So it looks novel and we should be able to address this without significant ramifications.

mcdurdin · 2025-02-20T05:00:40Z

The context(n) rule does not include nul in kmx (unlike if()). This consistency issue is a problem in the language (@markcsinclair FYI).

nul 'a' any(cons) > '12' context(2) '3'   c TYPE: es (nul context req)
'x' 'a' any(cons) > '45' context(3) '6'   c TYPE: xes

Note that usage of if() ... context() appears to be wrong in every single instance in the keyboards repository (regex search: \b(nul|if\().+context\():

13 results - 9 files

release\a\armenian_mnemonic\source\armenian_mnemonic.kmn:
  1183: if(option_key_t_letter_tyun != '1') dk(small_to) notany(t_base_keys) > U+0569 context(3)
  1186: if(option_key_t_letter_tyun != '1') dk(capital_to) notany(t_base_keys) > U+0539 context(3)
  1189: if(option_key_t_letter_tyun = '1') dk(small_tyun) notany(t_base_keys) > U+057F context(3)
  1192: if(option_key_t_letter_tyun = '1') dk(capital_tyun) notany(t_base_keys) > U+054F context(3)

release\gff\gff_awngi_xamtanga\source\gff_awngi_xamtanga.kmn:
  236:   if( &layer = "ዐ-layer" ) 'ዐ' + any(touchVowels) > context(2) index(ዐ_ቤተሰብ,3)

release\gff\gff_blin\source\gff_blin.kmn:
  239:   if( &layer = "ዐ-layer" ) 'ዐ' + any(touchVowels) > context(2) index(ዐ_ቤተሰብ,3)

release\gff\gff_ethiopic\source\gff_ethiopic.kmn:
  333:   if( &layer = "ዐ-layer" ) 'ዐ' + any(touchVowels) > context(2) index(ዐ_ቤተሰብ,3)

release\gff\gff_gurage\source\gff_gurage.kmn:
  241:   if( &layer = "ዐ-extra" ) 'ዐ' + any(touchVowels) > context(2) index(ዐ_ቤተሰብ,3)
  282:   if( &layer = "ዐ-layer" ) 'ዐ' + any(touchVowels) > context(2) index(ዐ_ቤተሰብ,3)

release\gff\gff_harari\source\gff_harari.kmn:
  272:   if( &layer = "ዐ-layer" ) 'ዐ' + any(touchVowels) > context(2) index(ዐ_ቤተሰብ,3)

release\gff\gff_tigre\source\gff_tigre.kmn:
  238:   if( &layer = "ዐ-layer" ) 'ዐ' + any(touchVowels) > context(2) index(ዐ_ቤተሰብ,3)

release\gff\gff_tigrinya_eritrea\source\gff_tigrinya_eritrea.kmn:
  240:   if( &layer = "ዐ-layer" ) 'ዐ' + any(touchVowels) > context(2) index(ዐ_ቤተሰብ,3)

release\gff\gff_tigrinya_ethiopia\source\gff_tigrinya_ethiopia.kmn:
  240:   if( &layer = "ዐ-layer" ) 'ዐ' + any(touchVowels) > context(2) index(ዐ_ቤተሰብ,3)

This suggests that perhaps it would be better to bring context(n) in line with index() -- as the existing keyboards will not be behaving correctly anyway. It would be a cleaner long-term outcome for keyboard authors as the current discrepancy is pretty hideous.

mcdurdin · 2025-02-20T07:06:08Z

I've completed some further analysis on this bug and it reinforces my previous comment -- it would be better to move towards implementing a fix in Keyman Core for this specific case of nul + context(n).

We have the following possible cases with context(n). Looking at the compiler outputs and the implementations in Keyman Core and KeymanWeb, it actually starts to look like the Keyman Core implementation and the documentation for context(n) are the outliers.

This is the test keyboard in question:

store(&TARGETS) 'any'

begin Unicode > use(main)

store(cons) 'mn'
store(ifx) '1'

group(main) using keys

c nul any(cons) + 'a' > 'x' context(1) 'z'

any(cons) + 'a' > '1' context(1)
nul any(cons) + 'b' > '2' context(2)    c <-- failing in debugger, 18.0.190-beta (and in Windows)
'x' any(cons) + 'c' > '3' context(2)
if(ifx = '1') any(cons) + 'd' > '4' context(2)

any > context(1)

any(cons) + 'a' > '1' context(1)

context: UC_SENTINEL CODE_ANY[0001] U+0005
output: U+0031 UC_SENTINEL CODE_CONTEXTEX[0011] U+0001

  if(k.KA(0,k.KC(1,1,t),this.s_cons_4)){
    r=m=1;   // Line 12
    k.KO(1,t,"1");
    k.KIO(-1,this.s_cons_4,1,t);
  }

nul any > context(2)

nul any(cons) + 'b' > '2' context(2)

context: UC_SENTINEL CODE_NUL[0004] UC_SENTINEL CODE_ANY[0001] U+0005
output: U+0032 UC_SENTINEL CODE_CONTEXTEX[0011] U+0002

With v9 kmw compiler mode (KN, KA, etc):

  if(k.KN(1,t)&&k.KA(0,k.KC(1,1,t),this.s_cons_4)){
    r=m=1;   // Line 13
    k.KO(1,t,"2");
    k.KIO(-1,this.s_cons_4,1,t);
  }

Note how k.kIO() has an offset parameter == 1 (so kmw implementation ignores nul but keyboard doesn't need to take it into acccount).

With v10 kmw compiler mode (KFCM, e.g. store(&version) '17.0'), unfortunately, this is incorrect:

  if(k.KFCM(2,t,[{t:'n'},{t:'a',a:this.s_cons_5}])){
    r=m=1;   // Line 14
    k.KDC(1,t);
    k.KO(-1,t,"2");
    k.KIO(-1,this.s_cons_5,1,t);
  }

Note: line number and store number differences are not relevant (due to additional store(&version)).

Note that k.KIO() has an offset parameter == 1, but in this case, KMW is adjusting for nul and so k.KIO does not emit anything, and the console warning Unmatched contextual index() statement detected in rule with index 1! is written to the console log.

char any > context(2)

'x' any(cons) + 'c' > '3' context(2)

context: U+0078 UC_SENTINEL CODE_ANY[0001] U+0005
output: U+0033 UC_SENTINEL CODE_CONTEXTEX[0011] U+0002

  if(k.KCM(2,t,"x",1)&&k.KA(1,k.KC(1,1,t),this.s_cons_4)){
    r=m=1;   // Line 14
    k.KO(2,t,"3");
    k.KIO(-1,this.s_cons_4,2,t);
  }

Note how k.kIO() has an offset parameter == 2.

if any > context(2)

if(ifx = '1') any(cons) + 'd' > '4' context(2)

context: UC_SENTINEL CODE_IFOPT[0014] U+0006 U+0002 U+0009 UC_SENTINEL CODE_ANY[0001] U+0005
output: U+0034 UC_SENTINEL CODE_CONTEXTEX[0011] U+0002

  if(this.s_ifx_5===this.s8&&k.KA(0,k.KC(1,1,t),this.s_cons_4)){
    r=m=1;   // Line 15
    k.KO(1,t,"4");
    k.KIO(-1,this.s_cons_4,1,t);
  }

Note how k.kIO() has an offset parameter == 1 (so kmw implementation ignores if() but keyboard doesn't need to take it into acccount)

Summary

If we move away from context(n) having a different n value than index(x,n) for if and nul examples, then:

.kmx compiler emits consistent offsets.
.js compiler handles nul and if() correctly for the KMW engine, meaning web works correctly too.
Which means, that it's Keyman Core's implementation of this nul+context(n) scenario that is incorrect
The documentation for context(n) is also wrong: if() ... > context(n) should be the same as if() ... > index(store,n)

Resolution

See:

bug(common): fix handling of CODE_CONTEXTEX and CODE_INDEX offsets in Developer, Core, and Web #13299

Workarounds

Just copy the output for plain characters:

c instead of:
nul 'xy' + 'a' > context(2) 'z'
c use:
nul 'xy' + 'a' > 'xz'

No workaround is available for a context() reference to notany().
No consistent workaround is available for a index() reference to any() (bug(developer): kmc-kmn kmw compiler generates an incorrect offset for nul ... > index(store,n) when in v10+ mode #13307).

Add validity checks to kmcmplib to verify that offsets provided to `context(n)` are in range for the context, and do not point to non-character elements (`nul` and `if`) in the context string. Fixes: #13301 Fixes: #13302 Relates-to: #13299 Relates-to: #13276

O1Anas added the bug label Feb 18, 2025

keymanapp-test-bot bot added the web/ label Feb 18, 2025

github-project-automation bot added this to Keyman Feb 18, 2025

mcdurdin self-assigned this Feb 19, 2025

mcdurdin added this to the B18S2 milestone Feb 19, 2025

mcdurdin moved this to Todo in Keyman Feb 19, 2025

mcdurdin mentioned this issue Jan 28, 2025

info: Platform support discrepancies #2239

Open

16 tasks

mcdurdin changed the title ~~bug(web): Context(n) counts the nul statement as a character in Web & android~~ bug(developer): Context(n) counts the nul statement as a character in Web & android Feb 20, 2025

keymanapp-test-bot bot added developer/ and removed web/ labels Feb 20, 2025

mcdurdin changed the title ~~bug(developer): Context(n) counts the nul statement as a character in Web & android~~ bug(developer): context(n) counts the nul statement as a character in Web & android Feb 20, 2025

This was referenced Feb 21, 2025

fix(developer): verify context(n) offsets are valid in kmcmplib #13308

Draft

bug(common): fix handling of CODE_CONTEXTEX and CODE_INDEX offsets in Developer, Core, and Web #13299

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(developer): `context(n)` counts the `nul` statement as a character in Web & android #13276

bug(developer): `context(n)` counts the `nul` statement as a character in Web & android #13276

O1Anas commented Feb 18, 2025 •

edited

Loading

mcdurdin commented Feb 19, 2025 •

edited

Loading

mcdurdin commented Feb 19, 2025

mcdurdin commented Feb 20, 2025

mcdurdin commented Feb 20, 2025 •

edited

Loading

bug(developer): context(n) counts the nul statement as a character in Web & android #13276

bug(developer): context(n) counts the nul statement as a character in Web & android #13276

Comments

O1Anas commented Feb 18, 2025 • edited Loading

At this line:

On the other hand, at this line:

Related issues

Keyman apps

Keyman version

Language name

Additional context

mcdurdin commented Feb 19, 2025 • edited Loading

mcdurdin commented Feb 19, 2025

mcdurdin commented Feb 20, 2025

mcdurdin commented Feb 20, 2025 • edited Loading

any > context(1)

nul any > context(2)

char any > context(2)

if any > context(2)

Summary

Resolution

Workarounds

bug(developer): `context(n)` counts the `nul` statement as a character in Web & android #13276

bug(developer): `context(n)` counts the `nul` statement as a character in Web & android #13276

O1Anas commented Feb 18, 2025 •

edited

Loading

mcdurdin commented Feb 19, 2025 •

edited

Loading

mcdurdin commented Feb 20, 2025 •

edited

Loading