Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching bytes backward #239

Closed
ncannasse opened this issue Mar 31, 2019 · 2 comments
Closed

Searching bytes backward #239

ncannasse opened this issue Mar 31, 2019 · 2 comments

Comments

@ncannasse
Copy link
Member

We currently have Bytes.find that allows searching bytes forward, but String.lastIndexOf would require searching bytes backwards.

Also, we might want to have some optimization when searching for very small patterns such as a single 16 bits char.

@nanjizal
Copy link
Contributor

Just had a look inside and presume first step might be to adjust the for loop, perhaps:

for (match_base = limit;
         match_base > block;
         match_base -= shift [*(match_base + pattern_size)])
      {

So could add a memfindBackwards_rb or otherwise named.

typedef unsigned char byte;
static void *
memfindBackwards_rb (const void  *in_block,      /*  Block containing data            */
            size_t       block_size,    /*  Size of block in bytes           */
            const void  *in_pattern,    /*  Pattern to search for            */
            size_t       pattern_size,  /*  Size of pattern block            */
            size_t      *shift,         /*  Shift table (search buffer)      */
            bool        *repeat_find)   /*  TRUE: search buffer already init */
{
    size_t
        byte_nbr,                       /*  Distance through block           */
        match_size;                     /*  Size of matched part             */
    const byte
        *match_base = NULL,             /*  Base of match of pattern         */
        *match_ptr  = NULL,             /*  Point within current match       */
        *limit      = NULL;             /*  Last potiental match point       */
    const byte
        *block   = (byte *) in_block,   /*  Concrete pointer to block data   */
        *pattern = (byte *) in_pattern; /*  Concrete pointer to search value */

    if (block == NULL || pattern == NULL || shift == NULL)
        return (NULL);

    /*  Pattern must be smaller or equal in size to string                   */
    if (block_size < pattern_size)
        return (NULL);                  /*  Otherwise it's not found         */

    if (pattern_size == 0)              /*  Empty patterns match at start    */
        return ((void *)block);

    /*  Build the shift table unless we're continuing a previous search      */

    /*  The shift table determines how far to shift before trying to match   */
    /*  again, if a match at this point fails.  If the byte after where the  */
    /*  end of our pattern falls is not in our pattern, then we start to     */
    /*  match again after that byte; otherwise we line up the last occurence */
    /*  of that byte in our pattern under that byte, and try match again.    */

    if (!repeat_find || !*repeat_find)
      {
        for (byte_nbr = 0; byte_nbr < 256; byte_nbr++)
            shift [byte_nbr] = pattern_size + 1;
        for (byte_nbr = 0; byte_nbr < pattern_size; byte_nbr++)
            shift [(byte) pattern [byte_nbr]] = pattern_size - byte_nbr;

        if (repeat_find)
            *repeat_find = true;
      }

    /*  Search for the block, each time jumping up by the amount             */
    /*  computed in the shift table                                          */

limit = block + (block_size - pattern_size + 1);
for (match_base = limit;
         match_base > block;
         match_base -= shift [*(match_base + pattern_size)])
      {
        match_ptr  = match_base;
        match_size = 0;

        /*  Compare pattern until it all matches, or we find a difference    */
        while (*match_ptr++ == pattern [match_size++])
          {
            /*  If we found a match, return the start address                */
            if (match_size >= pattern_size)
              return ((void*)(match_base));

          }
}

No idea if that's helpful or just nieve, perhaps if it might be useful ? ... I should actually try taking it further but I hate setting up builds. But don't want to get in the way if I'm completely off track.

@nanjizal
Copy link
Contributor

mm shift probably won't work like that I'll stop talking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants