Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match all tokens should look in different keys as well. #235

Closed
masimplo opened this issue May 10, 2018 · 15 comments
Closed

Match all tokens should look in different keys as well. #235

masimplo opened this issue May 10, 2018 · 15 comments

Comments

@masimplo
Copy link

masimplo commented May 10, 2018

From what I understand using tokenize with match all tokens currently forces a match of all tokens on the same key. Would it be possible to force a match of all tokens across any keys of a single match?

e.g.

     {
        title: "Old Man's War",
        author: {
          firstName: "John",
          lastName: "Scalzi"
        }
     }

searching for Old War will match this record, but Old War John will not.

This would be useful in searches where you want two or more attributes of a record to match.
i.e.

{
    jobtitle: 'Lawyer',
    country: 'France'
},
{
    jobtitle: 'Developer',
    country: 'France'
}

There should be a way for a search to return only the first record when search for "lawyer france"

@krisk
Copy link
Owner

krisk commented May 26, 2018

Interesting. Need think on this some more 😅.

@MenesesEvandro
Copy link

MenesesEvandro commented Jul 24, 2018

This would be awesome! =)

Opens a lot of new possibilities!!

@thediveo
Copy link

thediveo commented Aug 15, 2018

I've hit the same problem and am currently working around this limitation by using my own getFn which simply bangs all fields together into a single string ... this requires another hack, where I allow multiple fields to be specified in a single key, separated by ",". Yes, an ugly hack, but gets the job done.

@Sigmus
Copy link

Sigmus commented Aug 16, 2018

Why do you think it's an ugly hack? That's something that could potentially become a feature.

@thediveo
Copy link

One does not preclude the other... ;)

@Sigmus
Copy link

Sigmus commented Aug 16, 2018

Show us the code!

@thediveo
Copy link

One downside I see is that reporting hit ranges will not work, because these relate to the virtual combined item, not the real key items.

@jjsiman
Copy link

jjsiman commented Apr 8, 2019

Could really use this feature for my current project! Would love to see it implemented.

@ioRekz
Copy link

ioRekz commented Oct 29, 2019

This would be really helpful

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@unchewyglees
Copy link

Any update on this? I think searching across fields would be a common task like the examples stated above by @masimplo and I'm not sure if my solution would be a recommended implementation based on best practices.

Between @thediveo's solution with the getFn and this on StackOverflow https://stackoverflow.com/questions/51187276/fuse-js-retrieve-records-which-exactly-match-a-multiple-word-search that adds a keywords field as an array of the desired search fields, the solution that worked for me was a combination.

I memoized my search list with a new field "keywords" as an array of the search fields concatenated by a space in different orders.

Example:

const list = [{
    title: "Old Man's War",
    author: {
        firstName: "John",
        lastName: "Scalzi"
    }
}];

would be defined to Fuse instead as

const newList = [{
    title: "Old Man's War",
    author: {
        firstName: "John",
        lastName: "Scalzi"
    },
    keywords: [
        "Old Man's War John Scalzi",
        "John Scalzi Old Man's War"
    ]
}];

by doing

const newList = list.map((item) => ({
    ...item,
    keywords: [
        `${item.title} ${item.author.firstName} ${item.author.lastName}`,
        `${item.author.firstName} ${item.author.lastName} ${item.title}`
    ]
}));

I found that an array of just the search fields without concatenating them did not work when I tried it. For example, it did not work when I did this:

const newList = list.map((item) => ({
   ...item,
    keywords: [item.title, item.author.firstName, item.author.lastName]
}));

I found that my search word order would matter without having the order combinations in the keywords array.
For example, "lawyer france" would not match if I defined the keywords field with only ${item.country} ${item.jobtitle} unless I also added ${item.jobtitle} ${item.country} as well. This way, searches as either "lawyer france" or "france lawyer", would return the expected match.

{
    jobtitle: 'Lawyer',
    country: 'France'
}

In my opinion, this only works for minimal fields as order combinations would grow exponentially, and,
as @thediveo mentioned, the downside is location matching/highlighting will be different if it's a concern.

Also, I'm not sure if there are any Fuse options that would have allowed my solution to work differently, but they were left as their defaults.

@0xdevalias
Copy link

0xdevalias commented May 28, 2021

For anyone who ends up here from a google search or otherwise, we ended up taking a different approach in sparkletown/sparkle#1460 (thanks to @yarikoptic's awesome work debugging, exploring, and refining this)

We basically split our search query using regex (tokeniseStringWithQuotesBySpaces), to tokenise each individual word, but keep words that are between " and " as a single token):

/**
 * Split the provided string by spaces (ignoring spaces within "quoted text") into an array of tokens.
 *
 * @param string
 *
 * @see https://stackoverflow.com/a/16261693/1265472
 *
 * @debt Depending on the outcome of https://github.com/github/codeql/issues/5964 we may end up needing to change
 *   this regex for performance reasons.
 */
export const tokeniseStringWithQuotesBySpaces = (string: string): string[] =>
  string.match(/("[^"]*?"|[^"\s]+)+(?=\s*|\s*$)/g) ?? [];

(Note: Please check github/codeql#5964 as the regex may have a ReDoS vulnerability, but it also might just be a false positive in the CodeQL scanner)

With our standard Fuse config:

      new Fuse(filteredPosterVenues, {
        keys: [
          "name",
          "poster.title",
          "poster.authorName",
          "poster.categories",
        ],
        threshold: 0.2, // 0.1 seems to be exact, default 0.6: brings too distant if anyhow related hits
        ignoreLocation: true, // default False: True - to search ignoring location of the words.
        findAllMatches: true,
      }),

But then use our tokeniseStringWithQuotesBySpaces tokeniser + customised Fuse query (using $and to join each of our tokens, then $or for the different fields) for the search:

const tokenisedSearchQuery = tokeniseStringWithQuotesBySpaces(
  normalizedSearchQuery
);

if (tokenisedSearchQuery.length === 0) return filteredPosterVenues;

return fuseVenues
  .search({
    $and: tokenisedSearchQuery.map((searchToken: string) => {
      const orFields: Fuse.Expression[] = [
        { name: searchToken },
        { "poster.title": searchToken },
        { "poster.authorName": searchToken },
        { "poster.categories": searchToken },
      ];

      return {
        $or: orFields,
      };
    }),
  })
  .map((fuseResult) => fuseResult.item);

This seems to work pretty effectively for our needs from my testing of it all today.

@adlerfaulkner
Copy link

adlerfaulkner commented Dec 14, 2021

Note to anyone using the solution above by @0xdevalias: There is a bug in scoring when searching the same token across multiple keys because the FIRST match's score is used, not the BEST match.

I fixed, tested, and wrote more about the bug here: #593

Edit:
Until @krisk reviews, my fork with this fix in master can be found here https://github.com/comake/Fuse

@titenis
Copy link

titenis commented Feb 22, 2022

Is this lib not maintained anymore? I cant believe that 4 years passed and this is still not implemented, this issue is even not open lol. I think proposed solutions, when you have to assemble keyword field yourself, really defeats the purpose of consuming this lib at all.

@TheCodemate
Copy link

Another 2 years passed by and I guess it's still not there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests