Engine for the emoji whisperer npm package

Building a search engine for emojis

Implementation

The inverse word frequency ($iwf$) is calculated as:

$$ iwf(e, W) = \log \frac{{|W|}}{{|{w \in W: e \in w}|}}$$

where $|W|$ is the total number of words, and $|{w \in W: e \in w}|$ is the number of words where the emoji $e$ appears.

The median frequency ($mf$) is the median distance between a word an the emoji.

The emoji-word frequency ($ewf$) is calculated as:

$$ ewf(e, w) = \frac{{\textit{{count of emoji }} e \textit{{ for word }} w}}{{\textit{{total number of emojis}}}}$$

where $f_{e, w}$ is the frequency of emoji $e$ given word $w$.

Finally, the score is computed as:

$$ score(e, w, W) = \frac{{iwf(e, W)}}{{mf + ewf(e, w)}}$$

$e$ corresponds to an emoji, $w$ corresponds to a word in the query, and $W$ corresponds to the entire corpus from which the index was built.

Given an input query $Q$, the resulting emoji is calculated as:

$$ score(e, Q, W) = \sum_{{w \in Q}} \frac{{\textit{{iwf}}(e, W)}}{{\textit{{mf}}_w + \textit{{ewf}}(e, w)}} $$

$$ \textit{{topEmojis}}(Q, W, n) = \textit{{top }} n \textit{{ emojis }} e \textit{{ sorted by }} \textit{{score}}(e, Q, W) $$

See repository https://github.com/MagnusCardell/emoji-whisperer for an implementation of index in node.js

Results

example input sentence and output 5 top scoring emoji groups

Who else is excited for the new Avengers movie? #MarvelFan,"😙👌, ✨, 🤝, 😂🤣, 🤷‍♂️🙏"
Can't believe how beautiful the sunset was today. #NaturePhotography,"💖, 😙👌🏼, 🔋, 😔🙏, 👌🏼👌🏼"
Dinner at my favorite sushi place #Foodie,"😭, 🙌🙌🙌, 🤞🏽, 👏🏻, 👏"
Throwback to my trip to Paris last summer #TravelDiaries,"😎🤙🏽, 🤘🏽, 🐝✊, 🙌🌅, 😎😂👍"
Feeling so blessed to have such amazing people in my life #Blessed,"😂🙌🏼, 🔥, ❤️🙏🏾💯, 😐✋🏼, 🇸🇴"
That was the best concert ever!,"😩👌, 👌🏽, 🤩🙌, 🤏🏼, 🤚🏼"
I'm scared of spiders.,"🙈, ✨🤞🏼, 😔🤚, 😃👋, 😁✌️"
My heart is broken.,"💙, ❤, 🍃, 🙏🏽😩, 🥰👍"
I can't wait for my birthday.,"💎🙌, ✨, 💪🏾🔥, 🔥🔥🔥🙌🙌🙌, 🇬🇹"
angry,"👺, 💢, 🗯, 😖, 😣"
love,"🙏🏽😃, 💘, ♥, ❣, 🏩"
hate,"👈🏽💯👁, 🙄👎🏼, 😈👌🏻🔥, ✋🏼🙄, 💪🏾💖"
food,"🌭, 🌮, 🌯, 🌶, 🌽"
hungry,"😭🖕, 😔🤚🏽, 😂😂😂😭, 😊👍👍❤️👍, 👍😭"
tired,"😫, 🛀, 🛁, 🛏, 😪"
excited,"🤑👏🏼👏🏼, 😭🙌🏻, 😩🙌🏼, 🤩🙌🏻👏🏻, 🤪🙌🏻"
work,"👏🏼👍🏼, ⌨, 🏢, 💻, 💼"
home,"🏴󠁧󠁢󠁥󠁮󠁧󠁿✌️, 👠👠👠, ✈️, 🏘, 🏠"
play,"▶, 🎴, 😂😂😂👏🏽👏🏽👏🏽, 💯🙌🏽, 😎👍😂😂"
game,"😎👍👍🇸🇻, ♠, ♣, ♥, ♦"
sports,"⚽, ⚾, ⛷, ⛸, 🎱"
music,"👏🏾👏🏾🥺, 🎙, 🎚, 🎛, 🎵"
movie,"🎞, 🎟, 🎥, 🎦, 🎫"
book,"📖, 📔, 📕, 📗, 📘"
travel,"⛩, ⛰, ⛱, ⛲, ⛴"
adventure,"🙏🏽🏈, 🙏🏾👍🏾😎, 🙌🙌🙌💙💙💙🔥🔥🔥, 😎🤙🏼, 😃🙏🏽"
family,"👨‍👩‍👦, 👨‍👨‍👦, 👨‍👨‍👦‍👦, 👨‍👨‍👧, 👨‍👨‍👧‍👦"
party,"🍷, 🍾, 🎁, 🎆, 🎇"

Notes

Building index:

collect documents (sentences with emojis)
tokenize the documents
preprocess the tokens. lowercase, cleanup, english
Index documents with inverted index

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Engine for the emoji whisperer npm package

Implementation

Results

Notes

Files

README.md

Latest commit

History

README.md

File metadata and controls

Engine for the emoji whisperer npm package

Implementation

Results

Notes