Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

one thread pinning a single core continuously #7447

Closed
pbooth opened this issue Oct 11, 2016 · 22 comments
Closed

one thread pinning a single core continuously #7447

pbooth opened this issue Oct 11, 2016 · 22 comments

Comments

@pbooth
Copy link

pbooth commented Oct 11, 2016

Bug report

InfluxDB 1.0.0~rc2-1, 1.0.0-1, 1.0.2-1 but not 0.13.0-1 or earlier running on RHEL 6.7 or 6.6 on dual Haswell / Ivy Bridge servers with 256GB of memory and local storage or FusionIO.

Actual behavior: I see one thread pinning a single core continuously, without any queries running and no continuous queries defined.

This is what the problematic thread is doing:

Thread 9 (Thread 0x7f20e6094700 (LWP 29016)):
#0 0x000000000042141d in runtime.shade ()
#1 0x0000000000414fe4 in runtime.gcmarkwb_m ()
#2 0x0000000000456494 in runtime.writebarrierptr_nostore1.func1 ()
#3 0x00000000004611ab in runtime.systemstack ()
#4 0x0000000000415081 in runtime.writebarrierptr_nostore1 ()
#5 0x0000000000415182 in runtime.writebarrierptr ()
#6 0x000000000045687c in runtime.typedslicecopy.func1 ()
#7 0x0000000000461179 in runtime.systemstack ()
#8 0x0000000000435710 in runtime.startTheWorldWithSema ()
#9 0x000000c820018000 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 9 (Thread 0x7f20e6094700 (LWP 29016)):
#0 0x0000000000415121 in runtime.writebarrierptr ()
#1 0x000000000045687c in runtime.typedslicecopy.func1 ()
#2 0x0000000000461179 in runtime.systemstack ()
#3 0x0000000000435710 in runtime.startTheWorldWithSema ()
#4 0x000000c820018000 in ?? ()
#5 0x0000000000000000 in ?? ()

Thread 9 (Thread 0x7f20e6094700 (LWP 29016)):
#0 0x00000000004163c2 in runtime.heapBitsBulkBarrier ()
#1 0x00000000004152e8 in runtime.typedmemmove ()
#2 0x00000000004567fa in runtime.typedslicecopy.func1 ()
#3 0x0000000000461179 in runtime.systemstack ()
#4 0x0000000000435710 in runtime.startTheWorldWithSema ()
#5 0x000000c820018000 in ?? ()
#6 0x0000000000000000 in ?? ()

Thread 9 (Thread 0x7f20e6094700 (LWP 29016)):
#0 0x000000000041509e in runtime.writebarrierptr_nostore1 ()
#1 0x0000000000415182 in runtime.writebarrierptr ()
#2 0x000000000045688f in runtime.typedslicecopy.func1 ()
#3 0x0000000000461179 in runtime.systemstack ()
#4 0x0000000000435710 in runtime.startTheWorldWithSema ()
#5 0x000000c820018000 in ?? ()
#6 0x0000000000000000 in ?? ()

At any time two or three of six hosts will show this behavior. When an instance is in the state it appears to remain in the state for days or weeks, and this persists between reboots/restarts and minor upgrades.

top -b -n 1 | head

top - 09:19:52 up 137 days, 16:11, 8 users, load average: 1.04, 1.14, 1.21
Tasks: 499 total, 1 running, 497 sleeping, 0 stopped, 1 zombie
Cpu(s): 2.6%us, 0.2%sy, 0.0%ni, 97.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 264493604k total, 258651012k used, 5842592k free, 2846808k buffers
Swap: 0k total, 0k used, 0k free, 48732988k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
43315 influxdb 20 0 4823m 1.3g 507m S 94.9 0.5 23:48.58 influxd
3062 daemon 20 0 80876 38m 3440 S 7.3 0.0 2783:19 .vasd
1 root 20 0 36932 1488 1100 S 0.0 0.0 0:13.84 init

Additional info:

I don't know what distinguishes an instance that exhibits this behavior from one that doesn't

Possibly Irrelevant Observations:

  1. The scripts that I use to upload data to influxdb initially used HTTP and now use UDP.
    At times, points might be written multiple times (rerunning an upload script).
    At times the line format files didnt have fields or tags in alphabetical order
    At times, due to human error, an upload script might send a gzipped file creating measurements with non-ascii names
curl -o block.txt "http://localhost:8086/debug/pprof/block?debug=1" 
curl -o goroutine.txt "http://localhost:8086/debug/pprof/goroutine?debug=1" 
curl -o heap.txt "http://localhost:8086/debug/pprof/heap?debug=1" 
curl -o vars.txt "http://localhost:8086/debug/vars" 
iostat -xd 1 30 > iostat.txt
influx -execute "show shards" > shards.txt
influx -execute "show stats" > stats.txt
influx -execute "show diagnostics" > diagnostics.txt

Please run those if possible and link them from a gist.

Please note, the quickest way to fix a bug is to open a Pull Request.

@jwilder
Copy link
Contributor

jwilder commented Oct 11, 2016

@pbooth Can you run the following and attach them to the issue?

curl -o block.txt "http://localhost:8086/debug/pprof/block?debug=1" 
curl -o goroutine.txt "http://localhost:8086/debug/pprof/goroutine?debug=1" 
curl -o heap.txt "http://localhost:8086/debug/pprof/heap?debug=1" 
curl -o vars.txt "http://localhost:8086/debug/vars" 
iostat -xd 1 30 > iostat.txt
influx -execute "show shards" > shards.txt
influx -execute "show stats" > stats.txt
influx -execute "show diagnostics" > diagnostics.txt

@pbooth
Copy link
Author

pbooth commented Oct 11, 2016

influxBug.tar.gz

This tar ball contains the requested files

@jwilder
Copy link
Contributor

jwilder commented Oct 11, 2016

@pbooth Can you attach the logs for the last few hours where this CPU is pegged? Also, could you run find /var/lib/influxdb/data -type f -exec ls -la {} \; or wherever you have your data dir configured and attache the output?

It looks like one shard might be stuck in a full compaction loop which was fixed with #7258, but there might be another case where this is occurring.

@pbooth
Copy link
Author

pbooth commented Oct 11, 2016

influxdb.txt.gz

log and find output attached

@pbooth pbooth closed this as completed Oct 11, 2016
@pbooth pbooth reopened this Oct 11, 2016
@jwilder
Copy link
Contributor

jwilder commented Oct 11, 2016

@pbooth Are you writing data to this hosts? You mentioned that:

At times, points might be written multiple times (rerunning an upload script).

This will cause very expensive compactions to remove the duplicate data and can take awhile. It also may mean you are writing points back in time causing cold shards to become hot again and need to be re-compacted.

@pbooth
Copy link
Author

pbooth commented Oct 11, 2016

So the issue continues for days/weeks after I finish writing.

I had assumed that writing points in influxdb was idempotent. It didn't occur to me that duplicate points could be written. By duplicate I mean strictly duplicate- the same line format being uploaded to an endpoint.

@jwilder
Copy link
Contributor

jwilder commented Oct 11, 2016

The points are appended to the WAL and compacted into TSM files based on the time when they are stored. If the same points are written over and over, there are multiple copies of those points until they get compacted. You won't see the same data returned in queries though since we deduplicate the data before it's returned (if there are dups).

I only see 1 compaction started in the logs you attached and the server looks like it was only running for about 20 mins. I'd need to see a longer period of the logs to see if it's the same issue as #7258. Full compactions do not start until a shard is cold for 24hrs. If you are restarting the server frequently, a full compaction may be getting killed and has to restart next time.

@pbooth
Copy link
Author

pbooth commented Oct 12, 2016

I can get more logs from this host and the other host that's currently showing this behavior. They typically run for a few weeks without restarting.

Sent from my iPhone

On Oct 11, 2016, at 5:28 PM, Jason Wilder notifications@github.com wrote:

The points are appended to the WAL and compacted into TSM files based on the time when they are stored. If the same points are written over and over, their are multiple copies of those points until they get compacted. You won't see the same data returned in queries though since we deduplicate the data before it's returned (if there are dups).

I only see 1 compaction started in the logs you attached and the server looks like it was only running for about 20 mins. I'd need to see a longer period of the logs to see if it's the same issue as #7258. Full compactions do not start until a shard is cold for 24hrs. If you are restarting the server frequently, a full compaction may be getting killed and has to restart next time.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@pbooth
Copy link
Author

pbooth commented Oct 12, 2016

influxlog2.tar.gz
influxlog.tar.gz

These two tarballs contain seven days worth of logs for teh two problematic instances that are showing the 100% CPU usage

@jwilder
Copy link
Contributor

jwilder commented Oct 12, 2016

I'm not seeing any issue in the logs due to compactions constantly running as was happening in #7258. It looks like new compactions are getting scheduled as expected. As I already mentioned, overwriting points causes compactions to be very expensive and can tie up a single core for a while until they complete. Your intake/eight_weeks/513 shard looks like it was running a full compaction at the time of these logs.

Could you run:

influx_inspect dumptsm -all /var/lib/influxdb/data/intake/eight_weeks/513/000000087-000000005.tsm

@pbooth
Copy link
Author

pbooth commented Oct 13, 2016

dumptsm.txt.gz

This is it ...

@jwilder
Copy link
Contributor

jwilder commented Oct 13, 2016

@pbooth It looks like your are writing to a single series at nanosecond precision. If you are overwriting points, that entire file ends up getting re-compacted which will tie up a core until it complete.

@pbooth
Copy link
Author

pbooth commented Oct 14, 2016

Some more data:

The more serious consequence of this issue isn't using up a whole core, its that it makes the CLI non-responsive for a range of commands.

I realized that the influx_inspect dumptsm command that you had asked me to run refers to the retention policy "eight_weeks". This is an old, now unused retention policy and any overwritten points could only have been in a shard within the default four_years retention policy.

On a hunch I tried the following:

  1. Restarted influx (necessary for influx cli to be responsive)
  2. Connected with cli and did a DROP SHARD for all of the shards within the unused retention policies.
  3. CLI was still responsive, though the process was pinning a single core
  4. Did a DROP RETENTION POLICY eight_weeks ON intake

This request didn't return to the CLI and when I started another CLI it wasn't responsive.
iostat -xd 1 70 doesn't show any activity on the FusionIO disk where influxdb is installed.
Note that this disk supports over 200k IOPS - more than 1000x the throughput of a conventional disk.

Does the fact that the CPU usage jumps to 100% as soon as influxdb is restarted imply that full compactions a cant be the cause of this, if full compactions don't start until 24 hours after restart?

Something that I didn't expect: after doing a DROP SHARD for every shard in the retention policy eight_weeks or three_years I resarted influx. The DROP SHARD didn't actually remove the tsm file and I see that influxdb process still has these files open. Is this expected?

dropped all of the shards within the eight_weeks an dthree-years retention policies

@pbooth
Copy link
Author

pbooth commented Nov 21, 2016

An update:

I have an influxdb instance that has been running since 2016/11/18 13:18:4

At 2016/11/18 16:32:06 the disk filled and so it began writing errors like so:

[write] 2016/11/18 16:32:06 write failed for shard 669: engine: error writing WAL entry: write /var/lib/influxdb/wal/intake/four_years/669/_00042.wal: no space left on device
[udp] 2016/11/18 16:32:06 failed to write point batch to database "intake": engine: error writing WAL entry: write /var/lib/influxdb/wal/intake/four_years/669/_00042.wal: no space left on device
[write] 2016/11/18 16:33:06 write failed for shard 669: engine: error writing WAL entry: write /var/lib/influxdb/wal/intake/four_years/669/_00042.wal: no space left on device

Then at 2016/11/19 13:18:46, 24 hours after inflxudb started, it began trying to do full compactions (normal behavior)

[tsm1] 2016/11/19 13:18:46 beginning full compaction of group 0, 2 TSM files
[tsm1] 2016/11/19 13:18:48 beginning full compaction of group 0, 2 TSM files
[tsm1] 2016/11/19 13:18:49 beginning full compaction of group 0, 2 TSM files

Then at approximately 2016/11/21 11:03 I noticed the problem and freed up some space.

20 minutes later it logged "database index loaded" and then attempted to do a full compaction apparently failing with "aborting compaction".

Then 30 minutes later it began a loop of attempting aborting compaction which has repeated 347 times:

[tsm1] 2016/11/21 11:56:40 compacting full group (0) /var/lib/influxdb/data/intake/four_years/669/000000016-000000004.tsm (#0)
[tsm1] 2016/11/21 11:56:40 compacting full group (0) /var/lib/influxdb/data/intake/four_years/669/000000021-000000003.tsm (#1)
[tsm1] 2016/11/21 11:56:40 aborted full compaction group (0). compaction in progress
[tsm1] 2016/11/21 11:56:40 beginning full compaction of group 0, 2 TSM files
[tsm1] 2016/11/21 11:56:40 compacting full group (0) /var/lib/influxdb/data/intake/four_years/668/000000033-000000002.tsm (#0)
[tsm1] 2016/11/21 11:56:40 compacting full group (0) /var/lib/influxdb/data/intake/four_years/668/000000034-000000001.tsm (#1)
[tsm1] 2016/11/21 11:56:40 aborted full compaction group (0). compaction in progress
[tsm1] 2016/11/21 11:56:41 beginning full compaction of group 0, 2 TSM files
[tsm1] 2016/11/21 11:56:41 compacting full group (0) /var/lib/influxdb/data/intake/four_years/669/000000016-000000004.tsm (#0)
[tsm1] 2016/11/21 11:56:41 compacting full group (0) /var/lib/influxdb/data/intake/four_years/669/000000021-000000003.tsm (#1)
[tsm1] 2016/11/21 11:56:41 aborted full compaction group (0). compaction in progress
[tsm1] 2016/11/21 11:56:41 beginning full compaction of group 0, 2 TSM files
[tsm1] 2016/11/21 11:56:41 compacting full group (0) /var/lib/influxdb/data/intake/four_years/668/000000033-000000002.tsm (#0)
[tsm1] 2016/11/21 11:56:41 compacting full group (0) /var/lib/influxdb/data/intake/four_years/668/000000034-000000001.tsm (#1)
[tsm1] 2016/11/21 11:56:41 aborted full compaction group (0). compaction in progress
[tsm1] 2016/11/21 11:56:42 beginning full compaction of group 0, 2 TSM files
[tsm1] 2016/11/21 11:56:42 compacting full group (0) /var/lib/influxdb/data/intake/four_years/669/000000016-000000004.tsm (#0)
[tsm1] 2016/11/21 11:56:42 compacting full group (0) /var/lib/influxdb/data/intake/four_years/669/000000021-000000003.tsm (#1)
[tsm1] 2016/11/21 11:56:42 aborted full compaction group (0). compaction in progress
[tsm1] 2016/11/21 11:56:42 beginning full compaction of group 0, 2 TSM files
[tsm1] 2016/11/21 11:56:42 compacting full group (0) /var/lib/influxdb/data/intake/four_years/668/000000033-000000002.tsm (#0)
[tsm1] 2016/11/21 11:56:42 compacting full group (0) /var/lib/influxdb/data/intake/four_years/668/000000034-000000001.tsm (#1)
[tsm1] 2016/11/21 11:56:42 aborted full compaction group (0). compaction in progress
[tsm1] 2016/11/21 11:56:43 beginning full compaction of group 0, 2 TSM files
[tsm1] 2016/11/21 11:56:43 compacting full group (0) /var/lib/influxdb/data/intake/four_years/669/000000016-000000004.tsm (#0)
[tsm1] 2016/11/21 11:56:43 compacting full group (0) /var/lib/influxdb/data/intake/four_years/669/000000021-000000003.tsm (#1)
[tsm1] 2016/11/21 11:56:43 aborted full compaction group (0). compaction in progress

I could restart the instance with the configuration changed so it begins full compactions after one hour instead of 24?

Any suggestions?

@jwilder
Copy link
Contributor

jwilder commented Nov 21, 2016

You can ignore the aborted compaction log messages. See #7425

@Pomyk
Copy link

Pomyk commented Mar 15, 2017

Might be related: https://groups.google.com/forum/#!topic/golang-dev/PVwDFD7gDuk
Possibly non-preemptible loop in Merge() function which btw looks suboptimal. Merge can be done in O(n).

@mark-rushakoff
Copy link
Contributor

@Pomyk Yes, the merge performance fix is in #8131, due to be in the 1.3.0 release.

@jwilder
Copy link
Contributor

jwilder commented Mar 15, 2017

In the issue desc, runtime.typedslicecopy looks like Merge might be the root issue.

#7942 and specifically 784a851 in that PR should also help here.

@Pomyk
Copy link

Pomyk commented Mar 15, 2017

@mark-rushakoff I didn't notice the PR.

@jwilder runtime.Gosched() would help in the loop of Merge() if it is merging a lot of data. I don't think it makes much sense in combine() as it's loop can be preempted on next iteration.

@jwilder
Copy link
Contributor

jwilder commented Mar 15, 2017

@Pomyk the preemptible merge issue was fixed in #8116 which fixes merging too much data at once. #8131 was also just merged which will help as well.

@Pomyk
Copy link

Pomyk commented Mar 15, 2017

Nice, thanks. I guess this can be closed then.

@jwilder jwilder added this to the 1.3.0 milestone Mar 15, 2017
@jwilder
Copy link
Contributor

jwilder commented Mar 15, 2017

This should be fixed via #8116 and #8131.

@jwilder jwilder closed this as completed Mar 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants