This project provides custom ksqlDB user-defined functions (UDFs) which help to handle emojis contained in text.
Currently it provides the following UDFs:
Overview : leverages the emoji-java library to check if a string contains emojis
Type : SCALAR
Variations :
Variation : EMOJIS_CONTAINED(text VARCHAR)
Returns : BOOLEAN
Description : checks whether or not the given string contains emojis
text : the given text in which to check for any(!) emoji occurrences
Variation : EMOJIS_CONTAINED(text VARCHAR, specificEmojis ARRAY<VARCHAR>)
Returns : BOOLEAN
Description : checks whether or not the given string contains emojis
text : the given text in which to check for any of the specified emoji occurrences
specificEmojis: a list of specific emojis to look for
Overview : leverages the emoji-java library to count emojis within strings
Type : SCALAR
Variations :
Variation : EMOJIS_COUNT(text VARCHAR, unique BOOLEAN)
Returns : INT
Description : counts the number of potentially contained emojis with or without duplicates from the given string
text : the given text in which to count emojis
unique : if true will return count of unique emojis, if false counts all emojis i.e. also duplicates
Overview : leverages the emoji-java library to extract emojis from strings
Type : SCALAR
Variations :
Variation : EMOJIS_EXTRACT(text VARCHAR, unique BOOLEAN)
Returns : ARRAY<VARCHAR>
Description : extracts a list of potentially contained emojis with or without duplicates from the given string
text : the given text to extract emojis from
unique : if true will return only unique emojis (set semantic), if false every emoji i.e. also duplicate ones (list semantic) will be returned
Overview : leverages the emoji-java library to remove emojis contained in a string
Type : SCALAR
Variations :
Variation : EMOJIS_REMOVE(text VARCHAR)
Returns : VARCHAR
Description : removes emojis contained in a string
text : the given text from which to remove any(!) emojis
Variation : EMOJIS_REMOVE(text VARCHAR, specificEmojis ARRAY<VARCHAR>)
Returns : VARCHAR
Description : removes emojis contained in a string
text : the given text from which to remove any of the specified emojis
specificEmojis: a list of specific emojis to remove
Overview : leverages the emoji-java library to replace emojis contained in a string by their textual aliases
Type : SCALAR
Variations :
Variation : EMOJIS_TO_ALIASES(text VARCHAR, fpAction VARCHAR)
Returns : VARCHAR
Description : replace emojis contained in a string by their textual aliases
text : the given text in which to replace any(!) emojis by their textual aliases
fpAction : how to deal with Fitzpatrick modifiers, must be either PARSE, REMOVE or IGNORE
Version : 1.0.0
Overview : leverages the emoji-java library to replace emojis contained in a string by their HTML codepoints
Type : SCALAR
Variations :
Variation : EMOJIS_TO_HTMLCODEPOINTS(text VARCHAR, fpAction VARCHAR, encoding VARCHAR)
Returns : VARCHAR
Description : replace emojis contained in a string by their HTML codepoints
text : the given text in which to replace any(!) emojis by their HTML codepoints
fpAction : how to deal with Fitzpatrick modifiers, must be one of: PARSE, REMOVE, IGNORE
encoding : which HTML codepoints representation to use, must be one of: HEX, DEC
The UDF call examples below are based on the following pre-defined sample content:
-- 'create stream with example content'
CREATE STREAM examples
(id VARCHAR, content VARCHAR)
WITH (kafka_topic='examples',value_format='JSON',partitions=1,replicas=1,key='id');
-- 'insert a few records with or without emojis'
INSERT INTO examples (id) VALUES ('1');
INSERT INTO examples (id,content) VALUES ('2','');
INSERT INTO examples (id,content) VALUES ('3','This is text without any emojis.');
INSERT INTO examples (id,content) VALUES ('4','π€π€This π€© is textπ»πΊππwithπΈπemojisππ.π');
-- 'have fun with the EMOJI UDFs!'
ksql> SELECT id,content,EMOJIS_CONTAINED(content) AS result FROM examples EMIT CHANGES;
+----------------------------------------------------------------------------+----------------------------------------------------------------------------+----------------------------------------------------------------------------+
|ID |CONTENT |RESULT |
+----------------------------------------------------------------------------+----------------------------------------------------------------------------+----------------------------------------------------------------------------+
|1 |null |null |
|2 | |false |
|3 |This is text without any emojis. |false |
|4 |π€π€This π€© is textπ»πΊππwithπΈπemojisππ.π |true |
^CQuery terminated
ksql> SELECT id,content,EMOJIS_COUNT(content,false) AS result1,EMOJIS_COUNT(content,true) AS result2 FROM examples EMIT CHANGES;
+---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
|ID |CONTENT |RESULT1 |RESULT2 |
+---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
|1 |null |null |null |
|2 | |0 |0 |
|3 |This is text without any emojis. |0 |0 |
|4 |π€π€This π€© is textπ»πΊππwithπΈπemojisππ.π |12 |8 |
^CQuery terminated
ksql> SELECT id,content,EMOJIS_EXTRACT(content,false) AS result1,EMOJIS_EXTRACT(content,true) AS result2 FROM examples EMIT CHANGES;
+---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
|ID |CONTENT |RESULT1 |RESULT2 |
+---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
|1 |null |null |null |
|2 | |[] |[] |
|3 |This is text without any emojis. |[] |[] |
|4 |π€π€This π€© is textπ»πΊππwithπΈπemojisππ.π |[π€, π€, π€©, π», πΊ, π, π, πΈ, π, π, π, π] |[π€, π€©, π», πΊ, π, πΈ, π, π] |
^CQuery terminated
ksql> SELECT id,content,EMOJIS_REMOVE(content) AS result FROM examples EMIT CHANGES;
+----------------------------------------------------------------------------+----------------------------------------------------------------------------+----------------------------------------------------------------------------+
|ID |CONTENT |RESULT |
+----------------------------------------------------------------------------+----------------------------------------------------------------------------+----------------------------------------------------------------------------+
|1 |null |null |
|2 | | |
|3 |This is text without any emojis. |This is text without any emojis. |
|4 |π€π€This π€© is textπ»πΊππwithπΈπemojisππ.π |This is textwithemojis. |
^CQuery terminated
ksql> SELECT id,content,EMOJIS_TO_ALIASES(content,'PARSE') AS result FROM examples EMIT CHANGES;
+--------------------------------------------------------+--------------------------------------------------------+--------------------------------------------------------+
|ID |CONTENT |RESULT |
+--------------------------------------------------------+--------------------------------------------------------+--------------------------------------------------------+
|1 |null |null |
|2 | | |
|3 |This is text without any emojis. |This is text without any emojis. |
|4 |π€π€This π€© is textπ»πΊππwithπΈπemojisππ.π |:nerd::nerd:This :star_struck: is text:sunflower::hibisc|
| | |us::mushroom::mushroom:with:guitar::rocket:emojis:rocket|
| | |::rocket:.:clap: |
^CQuery terminated
+---------------------------------------------+-------------------------------------------------+---------------------------------------------+---------------------------------------------+
|ID |CONTENT |RESULT1 |RESULT2 |
+---------------------------------------------+-------------------------------------------------+---------------------------------------------+---------------------------------------------+
|1 |null |null |null |
|2 | | | |
|3 |This is text without any emojis. |This is text without any emojis. |This is text without any emojis. |
|4 |π€π€This π€© is textπ»πΊππwithπΈπemojisππ |🤓🤓This 🤩 is text|🤓🤓This 🤩 is text|
| |.π |33b;🌺🍄🍄with🎸&|803;🌺🍄🍄with🎸&|
| | |#x1f680;emojis🚀🚀.👏 |#128640;emojis🚀🚀.👏 |
^CQuery terminated
- You can either build the Maven project from sources or download the latest release as self-contained jar from here.
- Move the
emoji-functions-1.0.jar
file into a folder of your ksqlDB installation that is configured to load custom functions from during server bootstrap. - (Re)Start your ksqlDB server instance(s) to make it pick up and load the emoji functions.
- Verify if the deployment was successful by opening a ksqlDB CLI session and running
SHOW FUNCTIONS;
which should amongst all other available functions list the following emoji-related UDFs:
Function Name | Type
-----------------------------------
...
EMOJIS_CONTAINED | SCALAR
EMOJIS_COUNT | SCALAR
EMOJIS_EXTRACT | SCALAR
EMOJIS_REMOVE | SCALAR
EMOJIS_TO_ALIASES | SCALAR
EMOJIS_TO_HTMLCODEPOINTS | SCALAR
...
-----------------------------------
Thanks for Vincent Durmont's great emoji-java library which is used to do the hard emoji work behind the scenes!