Bulk index speed is slow #290

amirEBD · 2022-07-27T08:16:39Z

Hi everyone at @Sonic,
I was looking for ElasticSearch's alternatives because of resource usage issues on ES cluster. I found sonic so useful as an alternative.
So decided to do some benchmarks on it and see how it will respond. The problem now I face is that bulk index insertion is too slow regarding other search engines like elasticsearch/zinc.
I'm using go-sonic client to bulk some data inside sonic and it took about 1-2 hours to bulk below data! Should I change the client to nodeJS for example?

Data size: about 50MBs
Doc count: 2M string as Text field

Note:
Used the same config for sonic as it is on the github page
Also used docker for sonic.

Thanks for any help indeed :)

PS: As a comparison I tested Elasticsearch for about 5GB of data in 1 hour.

valeriansaliou · 2022-07-27T08:26:54Z

You could try the NodeJS client: https://github.com/valeriansaliou/node-sonic-channel which is official and for which I've measured the performances to be rather good yes.

amirEBD · 2022-07-27T12:42:25Z

Thanks for your suggestion,
I tried the node JS client but hadn't meet my expectations again!

There is no bulk api to send enourmues data into it
Sending pushes with bigger strings (e.g. a string with the length of 10 words) will cause a dissconnetion on the sonic

Is there anyway to put lots data into sonic for tests reasons? No example code was is the node client github. Just one ingest.js which will send 1 push to server.

valeriansaliou · 2022-07-27T12:46:54Z

The NodeJS library would split the text data into sub-commands chunks, so that definitely works. Though you should maybe pre-split your data before pushing.

Sonic was built for chat messages indexing + email indexing at first, which is why everything is centered around small chunks of data.

In other words, it is intended that inserting 1M messages results in 1M+ commands (a bit more considering some messages are larger than the max chunk size, but the NodeJS library handles splitting for you, based on the server dynamically-provided buffer size).

valeriansaliou · 2022-07-27T12:49:22Z

In order to maximize speed, note that you should split the work between multiple NodeJS instances running the ingestion on multiple split of your data. Let's say you have 4 cores on the server running the ingestion script, then you'd split your data in 4 and run 4 NodeJS instances to push that data to Sonic. Because each ingestion thread can be seen as a synchronous command channel, blocking for a few micro-seconds at each PUSH command.

On the Sonic server end (on another server), to maximize ingestion speed, you should also ensure you have as many CPUs as there are data producer NodeJS instances (as Sonic spawns 1 thread per Sonic Channel opened over TCP by clients), + some spare CPUs for the RocksDB internal threads to do their work.

And also adjust your config.cfg accordingly to max out your Sonic server resources.

That way you can max out your importer server + Sonic server capacity. Also make sure everything is running on fast SSDs.

amirEBD · 2022-07-30T05:41:50Z

Thanks for your detailed explanation. I had test go-sonic with the simple push which showed a better performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk index speed is slow #290

Bulk index speed is slow #290

amirEBD commented Jul 27, 2022

valeriansaliou commented Jul 27, 2022

amirEBD commented Jul 27, 2022

valeriansaliou commented Jul 27, 2022

valeriansaliou commented Jul 27, 2022 •

edited

Loading

amirEBD commented Jul 30, 2022

Bulk index speed is slow #290

Bulk index speed is slow #290

Comments

amirEBD commented Jul 27, 2022

valeriansaliou commented Jul 27, 2022

amirEBD commented Jul 27, 2022

valeriansaliou commented Jul 27, 2022

valeriansaliou commented Jul 27, 2022 • edited Loading

amirEBD commented Jul 30, 2022

valeriansaliou commented Jul 27, 2022 •

edited

Loading