High performance multithreaded indexer for the Path of Exile Stash API coded in C++11.
JSON files are pre-downloaded ahead through a first thread. In a second thread, downloaded files are parsed and data inserted into the database. Insertion leverages the LOAD DATA INFILE MySQL statement for increased insertion speed.
Two threads are used:
- The first one, download_loop downloads JSON changes at a regular interval
- The second one, processing_loop parses these files and insert data into the database
As files are downloaded through the first thread, their name is added to a queue. This queue is read by the second thread which progressively parses the corresponding files.
The second thread iterates over stashes, inserting items one by one. Item properties (properties, affixes, requirements and sockets) are however stored for a delayed insertion. When a certain amount of JSON files have been parsed, item properties are written to text files and inserted through 4 different threads (properties, affixes, requirements and sockets).
JSON files are downloaded with gzip compression using libCurl.
The RapidJSON library is used to parse JSON files.
Storage is done through MySQL (see schema.sql) and the interface with MySQL is done through the Connector/C++ library, the official MySQL C++ communication API.
Hardware
Hardware Intel Xeon W3520 (4 cores / 8 threads at 2.66 GHz+) with 16 Go RAM and SSD storage.
MySQL settings
innodb_buffer_pool_size = 12G
innodb_log_file_size = 2047M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_file_per_table=1
innodb_stats_on_metadata = OFF
innodb_buffer_pool_instances = 8
query_cache_type = 0
innodb_lock_wait_timeout = 50
innodb_log_buffer_size = 8M
Results
The indexer memory consumption is very small, about 300 MB (2%).
The plot below shows the evolution of item insertion speed across time.
The first thing you need is to have a running MySQL installation correctly tuned for your hardware.
To run the indexer, you will need to install the required dependencies:
- LibCurl
- RapidJSON
- Connector/C++
On MacOS, download Mysql-c++-connector. Extract the archive and copy the content of the lib
folder to /usr/local/lib
, and the content of the include
folder to /usr/local/include
. Next, download RapidJSON and copy the content of include
to /usr/local/include
.
On Ubuntu, install the following packages using apt-get install libcurl4-gnutls-dev libcurlpp-dev libmysqlcppconn-dev
. You may have to edit your MySQL configuration file (my.cnf) to authorize the LOAD DATA INFILE statement. Next, download RapidJSON and copy the content of include
to /usr/local/include
.
When all dependencies have been installed, open up a terminal and change directory to the indexer one and type make
to generate the binary. If all dependencies have been found, the code should now be compiled. Import the schema.sql DB scheme in your MySQL installation. The last step is to edit the config.cfg file to put the right credentials in.
When compiled, just run the indexer using ./indexer
. The program also allows to start from a specific change_id using ./indexer change_id
.