-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
one thread pinning a single core continuously #7447
Comments
@pbooth Can you run the following and attach them to the issue?
|
This tar ball contains the requested files |
@pbooth Can you attach the logs for the last few hours where this CPU is pegged? Also, could you run It looks like one shard might be stuck in a full compaction loop which was fixed with #7258, but there might be another case where this is occurring. |
log and find output attached |
@pbooth Are you writing data to this hosts? You mentioned that:
This will cause very expensive compactions to remove the duplicate data and can take awhile. It also may mean you are writing points back in time causing cold shards to become hot again and need to be re-compacted. |
So the issue continues for days/weeks after I finish writing. I had assumed that writing points in influxdb was idempotent. It didn't occur to me that duplicate points could be written. By duplicate I mean strictly duplicate- the same line format being uploaded to an endpoint. |
The points are appended to the WAL and compacted into TSM files based on the time when they are stored. If the same points are written over and over, there are multiple copies of those points until they get compacted. You won't see the same data returned in queries though since we deduplicate the data before it's returned (if there are dups). I only see 1 compaction started in the logs you attached and the server looks like it was only running for about 20 mins. I'd need to see a longer period of the logs to see if it's the same issue as #7258. Full compactions do not start until a shard is cold for 24hrs. If you are restarting the server frequently, a full compaction may be getting killed and has to restart next time. |
I can get more logs from this host and the other host that's currently showing this behavior. They typically run for a few weeks without restarting. Sent from my iPhone
|
influxlog2.tar.gz These two tarballs contain seven days worth of logs for teh two problematic instances that are showing the 100% CPU usage |
I'm not seeing any issue in the logs due to compactions constantly running as was happening in #7258. It looks like new compactions are getting scheduled as expected. As I already mentioned, overwriting points causes compactions to be very expensive and can tie up a single core for a while until they complete. Your Could you run:
|
This is it ... |
@pbooth It looks like your are writing to a single series at nanosecond precision. If you are overwriting points, that entire file ends up getting re-compacted which will tie up a core until it complete. |
Some more data: The more serious consequence of this issue isn't using up a whole core, its that it makes the CLI non-responsive for a range of commands. I realized that the influx_inspect dumptsm command that you had asked me to run refers to the retention policy "eight_weeks". This is an old, now unused retention policy and any overwritten points could only have been in a shard within the default four_years retention policy. On a hunch I tried the following:
This request didn't return to the CLI and when I started another CLI it wasn't responsive. Does the fact that the CPU usage jumps to 100% as soon as influxdb is restarted imply that full compactions a cant be the cause of this, if full compactions don't start until 24 hours after restart? Something that I didn't expect: after doing a DROP SHARD for every shard in the retention policy eight_weeks or three_years I resarted influx. The DROP SHARD didn't actually remove the tsm file and I see that influxdb process still has these files open. Is this expected? dropped all of the shards within the eight_weeks an dthree-years retention policies |
An update: I have an influxdb instance that has been running since 2016/11/18 13:18:4 At 2016/11/18 16:32:06 the disk filled and so it began writing errors like so: [write] 2016/11/18 16:32:06 write failed for shard 669: engine: error writing WAL entry: write /var/lib/influxdb/wal/intake/four_years/669/_00042.wal: no space left on device Then at 2016/11/19 13:18:46, 24 hours after inflxudb started, it began trying to do full compactions (normal behavior) [tsm1] 2016/11/19 13:18:46 beginning full compaction of group 0, 2 TSM files Then at approximately 2016/11/21 11:03 I noticed the problem and freed up some space. 20 minutes later it logged "database index loaded" and then attempted to do a full compaction apparently failing with "aborting compaction". Then 30 minutes later it began a loop of attempting aborting compaction which has repeated 347 times: [tsm1] 2016/11/21 11:56:40 compacting full group (0) /var/lib/influxdb/data/intake/four_years/669/000000016-000000004.tsm (#0) I could restart the instance with the configuration changed so it begins full compactions after one hour instead of 24? Any suggestions? |
You can ignore the aborted compaction log messages. See #7425 |
Might be related: https://groups.google.com/forum/#!topic/golang-dev/PVwDFD7gDuk |
@mark-rushakoff I didn't notice the PR. @jwilder runtime.Gosched() would help in the loop of Merge() if it is merging a lot of data. I don't think it makes much sense in combine() as it's loop can be preempted on next iteration. |
Nice, thanks. I guess this can be closed then. |
Bug report
InfluxDB 1.0.0~rc2-1, 1.0.0-1, 1.0.2-1 but not 0.13.0-1 or earlier running on RHEL 6.7 or 6.6 on dual Haswell / Ivy Bridge servers with 256GB of memory and local storage or FusionIO.
Actual behavior: I see one thread pinning a single core continuously, without any queries running and no continuous queries defined.
This is what the problematic thread is doing:
Thread 9 (Thread 0x7f20e6094700 (LWP 29016)):
#0 0x000000000042141d in runtime.shade ()
#1 0x0000000000414fe4 in runtime.gcmarkwb_m ()
#2 0x0000000000456494 in runtime.writebarrierptr_nostore1.func1 ()
#3 0x00000000004611ab in runtime.systemstack ()
#4 0x0000000000415081 in runtime.writebarrierptr_nostore1 ()
#5 0x0000000000415182 in runtime.writebarrierptr ()
#6 0x000000000045687c in runtime.typedslicecopy.func1 ()
#7 0x0000000000461179 in runtime.systemstack ()
#8 0x0000000000435710 in runtime.startTheWorldWithSema ()
#9 0x000000c820018000 in ?? ()
#10 0x0000000000000000 in ?? ()
Thread 9 (Thread 0x7f20e6094700 (LWP 29016)):
#0 0x0000000000415121 in runtime.writebarrierptr ()
#1 0x000000000045687c in runtime.typedslicecopy.func1 ()
#2 0x0000000000461179 in runtime.systemstack ()
#3 0x0000000000435710 in runtime.startTheWorldWithSema ()
#4 0x000000c820018000 in ?? ()
#5 0x0000000000000000 in ?? ()
Thread 9 (Thread 0x7f20e6094700 (LWP 29016)):
#0 0x00000000004163c2 in runtime.heapBitsBulkBarrier ()
#1 0x00000000004152e8 in runtime.typedmemmove ()
#2 0x00000000004567fa in runtime.typedslicecopy.func1 ()
#3 0x0000000000461179 in runtime.systemstack ()
#4 0x0000000000435710 in runtime.startTheWorldWithSema ()
#5 0x000000c820018000 in ?? ()
#6 0x0000000000000000 in ?? ()
Thread 9 (Thread 0x7f20e6094700 (LWP 29016)):
#0 0x000000000041509e in runtime.writebarrierptr_nostore1 ()
#1 0x0000000000415182 in runtime.writebarrierptr ()
#2 0x000000000045688f in runtime.typedslicecopy.func1 ()
#3 0x0000000000461179 in runtime.systemstack ()
#4 0x0000000000435710 in runtime.startTheWorldWithSema ()
#5 0x000000c820018000 in ?? ()
#6 0x0000000000000000 in ?? ()
At any time two or three of six hosts will show this behavior. When an instance is in the state it appears to remain in the state for days or weeks, and this persists between reboots/restarts and minor upgrades.
top -b -n 1 | head
top - 09:19:52 up 137 days, 16:11, 8 users, load average: 1.04, 1.14, 1.21
Tasks: 499 total, 1 running, 497 sleeping, 0 stopped, 1 zombie
Cpu(s): 2.6%us, 0.2%sy, 0.0%ni, 97.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 264493604k total, 258651012k used, 5842592k free, 2846808k buffers
Swap: 0k total, 0k used, 0k free, 48732988k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
43315 influxdb 20 0 4823m 1.3g 507m S 94.9 0.5 23:48.58 influxd
3062 daemon 20 0 80876 38m 3440 S 7.3 0.0 2783:19 .vasd
1 root 20 0 36932 1488 1100 S 0.0 0.0 0:13.84 init
Additional info:
I don't know what distinguishes an instance that exhibits this behavior from one that doesn't
Possibly Irrelevant Observations:
At times, points might be written multiple times (rerunning an upload script).
At times the line format files didnt have fields or tags in alphabetical order
At times, due to human error, an upload script might send a gzipped file creating measurements with non-ascii names
Please run those if possible and link them from a gist.
Please note, the quickest way to fix a bug is to open a Pull Request.
The text was updated successfully, but these errors were encountered: