Xkb_unicode.h keysymtab can be optimized

i crunched the numbers and found out that _glfwKeySym2Unicode in “xkb_unicode.h”, used
when no input method is available, performs a binary search with complexity O(log(n)), given that the keysymtable has up to 828 elements,
where each element occupies 4 bytes (2 short pairs), so 828 x 4 = 3312 bytes, which is redundant…
so if you use this way of binary search, then O(log(828)) = 2.918

you can simply reduce the size down to 2424 bytes of memory if you only take the unicode value in one linear array rather than using a LUT and have a function that calculates the index which will access the table with access O(1)…

the trick is to pad the corresponding unicode values with zeroes of a keysym group for a given ucs…

for example:

static const unsigned short keysymtab[] =
{
    ///XK_LATIN2 (57 unicode chars + 38 paddings)
    // 0x01a1 - 0x01ff
    0x0104, 0x02d8, 0x0141, 0x0000, 0x013d, 0x015a, 0x0000, 0x0000,
    0x0160, 0x015e, 0x0164, 0x0179, 0x0000, 0x017d, 0x017b, 0x0000,
    0x0105, 0x02db, 0x0142, 0x0000, 0x013e, 0x015b, 0x02c7, 0x0000,
    0x0161, 0x015f, 0x0165, 0x017a, 0x02dd, 0x017e, 0x017c, 0x0154,
    0x0000, 0x0000, 0x0102, 0x0000, 0x0139, 0x0106, 0x0000, 0x010c,
    0x0000, 0x0118, 0x0000, 0x011a, 0x0000, 0x0000, 0x010e, 0x0110,
    0x0143, 0x0147, 0x0000, 0x0000, 0x0150, 0x0000, 0x0000, 0x0158,
    0x016e, 0x0000, 0x0170, 0x0000, 0x0000, 0x0162, 0x0000, 0x0155,
    0x0000, 0x0000, 0x0103, 0x0000, 0x013a, 0x0107, 0x0000, 0x010d,
    0x0000, 0x0119, 0x0000, 0x011b, 0x0000, 0x0000, 0x010f, 0x0111,
    0x0144, 0x0148, 0x0000, 0x0000, 0x0151, 0x0000, 0x0000, 0x0159,
    0x016f, 0x0000, 0x0171, 0x0000, 0x0000, 0x0163, 0x02d9,
   . . . 
};

you pad it with zeroes so the unicode value is indexed correctly, when the keysym counts up…
this allows you to index the unicode value accordingly…

This will still require doing binary search of keysym in separate table that contains offset into this padded table, right?

no, i mentioned O(1) constant access, without binary search, all ucs stored in a linear array,
there is a trivial approach to calculate the index…

Yes, I understand ucs are in linear array. But how you know where they start for each specific keysym? Like in your example, how do you know 0x01a1 starts at index 0 ? And 0x02a1 would be 95 I assume?

they are defined in <X11/keysymdef.h>, for Latin1, they are mapped 1:1,
so you would normally start with Latin2…
you calculate an offset that starts at 0x01a1,
so e.g.,

unsigned int getIndex(unsigned int keysym)
{
   size_t offset = 0x01a1;
   //LATIN2: 0x01a1 - 0x01ff
   if(keysym >= 0x01a1 && keysym <= 0x01ff)
     return keysymtab[keysym - offset];

  offset += (0x02a1 - 0x01ff - 1);
  //LATIN3: 0x02a1 - 0x02fe
  if(keysym >= 0x02a1 && keysym <= 0x02fe)
    return keysymtab[keysym - offset];
  
  offset += (0x03a2 - 0x02fe - 1);
  //LATIN4: 0x03a2 - 0x03fe
  if(keysym >= 0x03a2 && keysym <= 0x03fe)
    return keysymtab[keysym - offset];
   //you do this for any ucs you want to add, but they have to be in ascending order..
   //because there is a huge gap between different charsets, you simply do not pad them,
   //but check for the respective range directly..
   //of course, ucs with 24-bit require special handling; to check 24-bit, AND with mask 0xff000000 and
   //it should return 0x01000000, as has been implemented in the original compilation unit..

  return GLFW_INVALID_CODEPOINT;
}

so it is a tedious thing to do, because
you have to work carefully so that indexing is still concise, but in effect it should reduce
the overall overhead…

That is kind of worse than binary search. As that is doing O(n) search. Tests on average N/2 checks before knowing where to lookup keysym.

That’s why I was asking if that will be another binary search:

unsigned int getIndex(unsigned int keysym)
{
    if (keysym < 0x8000) {
      if (keysym < 0x4000) {
        if (keysym < 0x2000 { // 0x0000 to 0x2000
          ... // repeat if's until you get to 0x0000 to 0x0200 range
          if (keysym < 0x01a1) return GLFW_INVALID_CODEPOINT;
          return keysymtab[keysym - 0x01a1]; // LATIN2
        } else { // 0x2000 to 0x4000
          ...
        }
       ...
   ...
  }
  return GLFW_INVALID_CODEPOINT;
}

This will test O(log(N)) checks before knowing which offset to use for lookup.

Or alternatively put in array for smaller code size:

    struct Offset { short first, last, offset; } offsets[] = 
    {
        { 0x01a1, 0x01ff, 0 }, // LATIN2
        { 0x02a1, 0x02fe, 95 }, // LATIN3
        ...
    };

and then do lookup of offset search on keysym:

unsigned int getIndex(unsigned int keysym)
{
    // find offset where offset->first >= keysym && offset->last <= keysym
    Offset* offset = binary_search(keysym);

    // return lookup if found
    if (offset) { return keysymtab[offset->offset + keysym - offset->first]; }

    return GLFW_INVALID_CODEPOINT;
}

And this function will be O(log(N)), better than O(N) checks you propose.

first, you are using a binary search on 828 elements, but in the linear variation, you do
a linear search by n charsets, and not 828 short pairs… the index calculation takes a complexity of O(N)
but not the access time, since you are literally calculating a random access index…
suppose you have these three character sets:

XK_LATIN2
0x01a1 - 0x01ff

XK_LATIN4
0x03a2 - 0x03fe

XK_KATAKANA
0x047e - 0x04df

then you have O(3) index calculation for 3 character sets with O(1) access time,
but if you used keypair variation for , then you have 112 elements from XK_LATIN4 0x01a1
till XK_KATAKANA 0x04df without padding which are counted as 112 elements from the original xkb_unicode.c implementation, so O(log(112)) is still larger than O(N = 3) + O(1)
for 3 character sets…

You misunderstood me. I’m not suggesting doing O(log(112)). For your three block example I’m suggesting O(log(3)) + O(1).

Obviously if you have just three blocks of keysyms to lookup, then there’s no reason for O(log(3)) and O(3) would be just fine - linear search will run on such thing without any problems. But for larger amounts I would suggest doing binary search for offset, and only then doing your O(1).

yes sure, this version of getIndex() was just an example, you can apply a binary search to find the right index even faster, but the overall thought of direct indexing is I think more beneficial

but I also think there aren’t even so many charsets defined in keysymdef.h that linear search would be too slow, still, binary search could be better

here an experiment:

the test approves that the indexing array variation is faster on average than the mapped version…