Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Dictionary lookup for supplied IDs to Embedding Operator #148

Merged

Conversation

oliverholworthy
Copy link
Member

  • Use Dictionary lookup for supplied IDs to Embedding Operator.
    • Improving the speed of index lookups for larger sets of embeddings.
  • Adds unknown_value option enable unknown IDs to set a default the value for the embedding returned for ids that are not found in the set of pre-trained embeddings
  • Changes the use of the mmap parameter to make it optional when passing a file. Currently if passing a file without mmap=True, we get an unrelated error.

Example

Using 10 million IDs. The operator transform runs in roughly 500-600 milliseonds. This scales proportionaly with the number of IDs.

After this change the operator transform runs in 50-60 microseconds with 10 million IDs.

@oliverholworthy oliverholworthy added the enhancement New feature or request label May 12, 2023
@oliverholworthy oliverholworthy added this to the Merlin 23.05 milestone May 12, 2023
@oliverholworthy oliverholworthy self-assigned this May 12, 2023
@oliverholworthy oliverholworthy merged commit ec9bedf into NVIDIA-Merlin:main May 12, 2023
@oliverholworthy oliverholworthy deleted the embeddings-faster-lookup branch May 12, 2023 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants