-
-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk index speed is slow #290
Comments
You could try the NodeJS client: https://github.com/valeriansaliou/node-sonic-channel which is official and for which I've measured the performances to be rather good yes. |
Thanks for your suggestion,
Is there anyway to put lots data into sonic for tests reasons? No example code was is the node client github. Just one ingest.js which will send 1 push to server. |
The NodeJS library would split the text data into sub-commands chunks, so that definitely works. Though you should maybe pre-split your data before pushing. Sonic was built for chat messages indexing + email indexing at first, which is why everything is centered around small chunks of data. In other words, it is intended that inserting 1M messages results in 1M+ commands (a bit more considering some messages are larger than the max chunk size, but the NodeJS library handles splitting for you, based on the server dynamically-provided buffer size). |
In order to maximize speed, note that you should split the work between multiple NodeJS instances running the ingestion on multiple split of your data. Let's say you have 4 cores on the server running the ingestion script, then you'd split your data in 4 and run 4 NodeJS instances to push that data to Sonic. Because each ingestion thread can be seen as a synchronous command channel, blocking for a few micro-seconds at each PUSH command. On the Sonic server end (on another server), to maximize ingestion speed, you should also ensure you have as many CPUs as there are data producer NodeJS instances (as Sonic spawns 1 thread per Sonic Channel opened over TCP by clients), + some spare CPUs for the RocksDB internal threads to do their work. And also adjust your That way you can max out your importer server + Sonic server capacity. Also make sure everything is running on fast SSDs. |
Thanks for your detailed explanation. I had test go-sonic with the simple push which showed a better performance. |
Hi everyone at @Sonic,
I was looking for ElasticSearch's alternatives because of resource usage issues on ES cluster. I found sonic so useful as an alternative.
So decided to do some benchmarks on it and see how it will respond. The problem now I face is that bulk index insertion is too slow regarding other search engines like elasticsearch/zinc.
I'm using go-sonic client to bulk some data inside sonic and it took about 1-2 hours to bulk below data! Should I change the client to nodeJS for example?
Data size: about 50MBs
Doc count: 2M string as Text field
Note:
Used the same config for sonic as it is on the github page
Also used docker for sonic.
Thanks for any help indeed :)
PS: As a comparison I tested Elasticsearch for about 5GB of data in 1 hour.
The text was updated successfully, but these errors were encountered: