Building a search engine for emojis
The inverse word frequency (
where
The median frequency (
The emoji-word frequency (
where
Finally, the score is computed as:
Given an input query
See repository https://github.com/MagnusCardell/emoji-whisperer for an implementation of index in node.js
example input sentence and output 5 top scoring emoji groups
Who else is excited for the new Avengers movie? #MarvelFan,"😙👌, ✨, 🤝, 😂🤣, 🤷♂️🙏"
Can't believe how beautiful the sunset was today. #NaturePhotography,"💖, 😙👌🏼, 🔋, 😔🙏, 👌🏼👌🏼"
Dinner at my favorite sushi place #Foodie,"😭, 🙌🙌🙌, 🤞🏽, 👏🏻, 👏"
Throwback to my trip to Paris last summer #TravelDiaries,"😎🤙🏽, 🤘🏽, 🐝✊, 🙌🌅, 😎😂👍"
Feeling so blessed to have such amazing people in my life #Blessed,"😂🙌🏼, 🔥, ❤️🙏🏾💯, 😐✋🏼, 🇸🇴"
That was the best concert ever!,"😩👌, 👌🏽, 🤩🙌, 🤏🏼, 🤚🏼"
I'm scared of spiders.,"🙈, ✨🤞🏼, 😔🤚, 😃👋, 😁✌️"
My heart is broken.,"💙, ❤, 🍃, 🙏🏽😩, 🥰👍"
I can't wait for my birthday.,"💎🙌, ✨, 💪🏾🔥, 🔥🔥🔥🙌🙌🙌, 🇬🇹"
angry,"👺, 💢, 🗯, 😖, 😣"
love,"🙏🏽😃, 💘, ♥, ❣, 🏩"
hate,"👈🏽💯👁, 🙄👎🏼, 😈👌🏻🔥, ✋🏼🙄, 💪🏾💖"
food,"🌭, 🌮, 🌯, 🌶, 🌽"
hungry,"😭🖕, 😔🤚🏽, 😂😂😂😭, 😊👍👍❤️👍, 👍😭"
tired,"😫, 🛀, 🛁, 🛏, 😪"
excited,"🤑👏🏼👏🏼, 😭🙌🏻, 😩🙌🏼, 🤩🙌🏻👏🏻, 🤪🙌🏻"
work,"👏🏼👍🏼, ⌨, 🏢, 💻, 💼"
home,"🏴✌️, 👠👠👠, ✈️, 🏘, 🏠"
play,"▶, 🎴, 😂😂😂👏🏽👏🏽👏🏽, 💯🙌🏽, 😎👍😂😂"
game,"😎👍👍🇸🇻, ♠, ♣, ♥, ♦"
sports,"⚽, ⚾, ⛷, ⛸, 🎱"
music,"👏🏾👏🏾🥺, 🎙, 🎚, 🎛, 🎵"
movie,"🎞, 🎟, 🎥, 🎦, 🎫"
book,"📖, 📔, 📕, 📗, 📘"
travel,"⛩, ⛰, ⛱, ⛲, ⛴"
adventure,"🙏🏽🏈, 🙏🏾👍🏾😎, 🙌🙌🙌💙💙💙🔥🔥🔥, 😎🤙🏼, 😃🙏🏽"
family,"👨👩👦, 👨👨👦, 👨👨👦👦, 👨👨👧, 👨👨👧👦"
party,"🍷, 🍾, 🎁, 🎆, 🎇"
Building index:
- collect documents (sentences with emojis)
- tokenize the documents
- preprocess the tokens. lowercase, cleanup, english
- Index documents with inverted index