KH Coder Speed on Linux: MySQL settings to boost performance #883
Replies: 7 comments 2 replies
-
Hi, When you run "pre-processing", there are console outputs like below.
Can you perform pre-processing of the tutorial file on both Windows and Linux and post the both outputs? |
Beta Was this translation helpful? Give feedback.
-
BTW, if you want to avoid TCP/IP connection to MySQL, you can try editing config/coder.ini as follows.
|
Beta Was this translation helpful? Give feedback.
-
I think the problem is more sinister. I was using exactly the same file on both. My Win10 run finishes in 9 minutes, while the Linux run terminated after 2.5 hours. I have improved my Linux installation, to give enough MySQL heap space (so that KHC does not switch to ISAM storage) and used the socket connection to increase the speed. This is the pre-processing dump you've asked for:
The real problem starts after all pre-processing is finished. KHC seems to be stuck saving data to its database. When I peeked at the MySQL connection, the following code is slowly being pumped into the database:
I've checked the indeces are all there and should be used but it is all so much slower! |
Beta Was this translation helpful? Give feedback.
-
Hmm, would you check your "innodb_flush_log_at_trx_commit" MySQL server setting? If it's "1", can you try setting it to "2" or "0", restarting MySQL server, and running pre-processing of KH Coder? |
Beta Was this translation helpful? Give feedback.
-
I have tried this but no difference. I have also followed up the thread on "Slow mysql on Ubuntu" and changed various MySQL parameters, e.g.
However, none of these had any significant effect. While on Windows the file is processed in 10 minutes, on Linux it takes days. By the way, I have tried it on two different computers, one with 64Gb RAM. |
Beta Was this translation helpful? Give feedback.
-
Well, I think the "innodb_flush_log_at_trx_commit" setting should have enormous impact on performance. Would you check whether the setting is in effect?
Also, results of the "show table status like 'rowdata'" may help. As a last resort, if you are not using your MySQL for other purposes, you can try setting default engine to MyISAM. (I assume your current default is INNODB.) Windows version of KH Coder comes with MySQL 5.6 with following settings (I think you already knew it but anyway...).
|
Beta Was this translation helpful? Give feedback.
-
Hi, I have finally crunched the problem. I have switched to MyISAM with your parameters and KHC worked very fast. However, I have noticed that your key_buffer_size was 384Mb while in my settings only 16Mb, the read buffer settings were also missing. So I have gone back to Innodb storage engine, however I have included the last three variable settings, i.e.
Restarted MySQL and KH Coder now works beautifully fast on Ubuntu 18.04! Thank you for you help -- Jacob |
Beta Was this translation helpful? Give feedback.
-
My Linux installation is incredibly slow compared with that on Windows. The main problem is either with the Stanford POS Tagger or with sending its results to the MySQL database. It means that KHC is just fine working on small tutorial documents but unusable for larger document sets. I suspect the reason is that the Windows version relies on named pipes to communicate with MySQL while the Linux version talks to MySQL via the network protocol. Has anybody been successful improving the KHC performance on Linux? (And yes, I have tried to load and export documents on Windows and then importing its on Linux takes forever.)
Beta Was this translation helpful? Give feedback.
All reactions