Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaging tests fail with "Gradle build daemon disappeared unexpectedly" #44623

Closed
andrershov opened this issue Jul 19, 2019 · 10 comments
Closed
Assignees
Labels
:Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts Team:Delivery Meta label for Delivery team >test-failure Triaged test failures from CI

Comments

@andrershov
Copy link
Contributor

There were a bunch of packaging test failures on different ES versions with the following error: "Gradle build daemon disappeared unexpectedly".

For example,
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.2+packaging-tests/173/console
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.8+packaging-tests/169/console

@andrershov andrershov added :Delivery/Build Build or test infrastructure >test-failure Triaged test failures from CI labels Jul 19, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@mark-vieira
Copy link
Contributor

It seems not all metal works are created equal. We have a mix of workers with 32G of memory and others with 64G. That in conjunction with our recent daemon leaking issue is probably causing the OOM killer to blow away the Gradle daemon.

@mark-vieira
Copy link
Contributor

I've opened up https://github.com/elastic/infra/issues/13356 to have infra look into beefing up all our metal workers such that they have 64G of RAM.

@droberts195
Copy link
Contributor

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.8+packaging-tests/278/console is another case of this error occurring on a worker with 32GB RAM (https://elasticsearch-ci.elastic.co/computer/worker-854308).

But https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.8+packaging-tests/277/console is a case of the same Jenkins job succeeding on a worker with 32GB RAM (https://elasticsearch-ci.elastic.co/computer/worker-854313).

So it sometimes works on 32GB machines.

But it's true that the cause of failure in build 278 is the OOM killer. I looked in the syslog of worker-854308 at the time the Gradle daemon died (note that Jenkins is UTC and worker-854308 is in central European time, so 2 hours ahead):

Aug 15 04:18:51 worker-854308 kernel: [40375149.361036] vboxdrv: 0000000000000000 VMMR0.r0
Aug 15 04:18:51 worker-854308 kernel: [40375149.465059] vboxdrv: 0000000000000000 VBoxDDR0.r0
Aug 15 04:18:51 worker-854308 kernel: [40375149.486977] audit: audit_lost=111737352 audit_rate_limit=500 audit_backlog_limit=8192
Aug 15 04:18:56 worker-854308 kernel: [40375154.691201] EMT-1 invoked oom-killer: gfp_mask=0x16042c0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK), nodemask=(null),  order=1, oom_score_adj=0
Aug 15 04:18:56 worker-854308 kernel: [40375154.691203] EMT-1 cpuset=/ mems_allowed=0
Aug 15 04:18:56 worker-854308 kernel: [40375154.691206] CPU: 7 PID: 1299703 Comm: EMT-1 Tainted: G           OE   4.13.0-36-generic #40~16.04.1-Ubuntu
Aug 15 04:18:56 worker-854308 kernel: [40375154.691207] Hardware name: FUJITSU  /D3401-H2, BIOS V5.0.0.12 R1.14.0 for D3401-H2x                    10/24/2017
Aug 15 04:18:56 worker-854308 kernel: [40375154.691207] Call Trace:
Aug 15 04:18:56 worker-854308 kernel: [40375154.691211]  dump_stack+0x63/0x8b
Aug 15 04:18:56 worker-854308 kernel: [40375154.691213]  dump_header+0x97/0x225
Aug 15 04:18:56 worker-854308 kernel: [40375154.691216]  ? security_capable_noaudit+0x4b/0x70
Aug 15 04:18:56 worker-854308 kernel: [40375154.691217]  oom_kill_process+0x219/0x420
Aug 15 04:18:56 worker-854308 kernel: [40375154.691218]  out_of_memory+0x11d/0x4b0
Aug 15 04:18:56 worker-854308 kernel: [40375154.691219]  __alloc_pages_slowpath+0xd32/0xe10
Aug 15 04:18:56 worker-854308 kernel: [40375154.691221]  __alloc_pages_nodemask+0x263/0x280
Aug 15 04:18:56 worker-854308 kernel: [40375154.691223]  alloc_pages_current+0x6a/0xe0
Aug 15 04:18:56 worker-854308 kernel: [40375154.691225]  new_slab+0x31e/0x650
Aug 15 04:18:56 worker-854308 kernel: [40375154.691226]  ___slab_alloc+0x267/0x4c0
Aug 15 04:18:56 worker-854308 kernel: [40375154.691234]  ? rtR0MemAllocEx+0x175/0x240 [vboxdrv]
Aug 15 04:18:56 worker-854308 kernel: [40375154.691238]  ? rtR0MemAllocEx+0x175/0x240 [vboxdrv]
Aug 15 04:18:56 worker-854308 kernel: [40375154.691240]  __slab_alloc+0x20/0x40
Aug 15 04:18:56 worker-854308 kernel: [40375154.691241]  ? __slab_alloc+0x20/0x40
Aug 15 04:18:56 worker-854308 kernel: [40375154.691242]  __kmalloc+0x190/0x200
Aug 15 04:18:56 worker-854308 kernel: [40375154.691245]  rtR0MemAllocEx+0x175/0x240 [vboxdrv]
Aug 15 04:18:56 worker-854308 kernel: [40375154.691249]  VBoxHost_RTMemAllocZTag+0x2a/0x60 [vboxdrv]
Aug 15 04:18:56 worker-854308 kernel: [40375154.691252]  ? ttwu_do_wakeup+0x1e/0x150
Aug 15 04:18:56 worker-854308 kernel: [40375154.691254]  ? __kmalloc+0x163/0x200
Aug 15 04:18:56 worker-854308 kernel: [40375154.691258]  ? rtR0MemAllocEx+0x175/0x240 [vboxdrv]
Aug 15 04:18:56 worker-854308 kernel: [40375154.691261]  ? supdrvIOCtl+0x1837/0x3480 [vboxdrv]
Aug 15 04:18:56 worker-854308 kernel: [40375154.691263]  ? __check_object_size+0xfc/0x1a0
Aug 15 04:18:56 worker-854308 kernel: [40375154.691265]  ? _copy_from_user+0x36/0x70
Aug 15 04:18:56 worker-854308 kernel: [40375154.691268]  ? VBoxDrvLinuxIOCtl_5_2_28+0x160/0x250 [vboxdrv]
Aug 15 04:18:56 worker-854308 kernel: [40375154.691270]  ? do_vfs_ioctl+0xa4/0x600
Aug 15 04:18:56 worker-854308 kernel: [40375154.691272]  ? getnstimeofday64+0xe/0x20
Aug 15 04:18:56 worker-854308 kernel: [40375154.691274]  ? __audit_syscall_entry+0xaf/0xf0
Aug 15 04:18:56 worker-854308 kernel: [40375154.691276]  ? syscall_trace_enter+0x1d9/0x2f0
Aug 15 04:18:56 worker-854308 kernel: [40375154.691277]  ? SyS_ioctl+0x79/0x90
Aug 15 04:18:56 worker-854308 kernel: [40375154.691278]  ? do_syscall_64+0x61/0xd0
Aug 15 04:18:56 worker-854308 kernel: [40375154.691280]  ? entry_SYSCALL64_slow_path+0x25/0x25
Aug 15 04:18:56 worker-854308 kernel: [40375154.691281] Mem-Info:
Aug 15 04:18:56 worker-854308 kernel: [40375154.691283] active_anon:5869297 inactive_anon:50425 isolated_anon:0
Aug 15 04:18:56 worker-854308 kernel: [40375154.691283]  active_file:567 inactive_file:1200 isolated_file:0
Aug 15 04:18:56 worker-854308 kernel: [40375154.691283]  unevictable:0 dirty:28 writeback:0 unstable:0
Aug 15 04:18:56 worker-854308 kernel: [40375154.691283]  slab_reclaimable:24638 slab_unreclaimable:44444
Aug 15 04:18:56 worker-854308 kernel: [40375154.691283]  mapped:2084532 shmem:75303 pagetables:20043 bounce:0
Aug 15 04:18:56 worker-854308 kernel: [40375154.691283]  free:50479 free_pcp:35 free_cma:0
Aug 15 04:18:56 worker-854308 kernel: [40375154.691285] Node 0 active_anon:23477188kB inactive_anon:201700kB active_file:2268kB inactive_file:4800kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:8338128kB dirty:112kB writeback:0kB shmem:301212kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 14860288kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Aug 15 04:18:56 worker-854308 kernel: [40375154.691286] Node 0 DMA free:15896kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15980kB managed:15896kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.691288] lowmem_reserve[]: 0 2242 31887 31887 31887
Aug 15 04:18:56 worker-854308 kernel: [40375154.691290] Node 0 DMA32 free:123364kB min:4836kB low:7176kB high:9516kB active_anon:1761888kB inactive_anon:0kB active_file:0kB inactive_file:248kB unevictable:0kB writepending:4kB present:2408100kB managed:2342532kB mlocked:0kB kernel_stack:16kB pagetables:868kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.691292] lowmem_reserve[]: 0 0 29644 29644 29644
Aug 15 04:18:56 worker-854308 kernel: [40375154.691294] Node 0 Normal free:62656kB min:62712kB low:93072kB high:123432kB active_anon:21715300kB inactive_anon:201700kB active_file:2448kB inactive_file:4416kB unevictable:0kB writepending:336kB present:30908416kB managed:30360712kB mlocked:0kB kernel_stack:15632kB pagetables:79304kB bounce:0kB free_pcp:180kB local_pcp:0kB free_cma:0kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.691296] lowmem_reserve[]: 0 0 0 0 0
Aug 15 04:18:56 worker-854308 kernel: [40375154.691297] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15896kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.691303] Node 0 DMA32: 175*4kB (UME) 131*8kB (UME) 162*16kB (UME) 151*32kB (UME) 125*64kB (UME) 108*128kB (ME) 76*256kB (ME) 49*512kB (UME) 43*1024kB (UM) 0*2048kB 1*4096kB (H) = 123668kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.691310] Node 0 Normal: 11975*4kB (UME) 2027*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 64116kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.691315] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.691316] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.691316] 77213 total pagecache pages
Aug 15 04:18:56 worker-854308 kernel: [40375154.691317] 0 pages in swap cache
Aug 15 04:18:56 worker-854308 kernel: [40375154.691317] Swap cache stats: add 0, delete 0, find 0/0
Aug 15 04:18:56 worker-854308 kernel: [40375154.691317] Free swap  = 0kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.691318] Total swap = 0kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.691318] 8333124 pages RAM
Aug 15 04:18:56 worker-854308 kernel: [40375154.691318] 0 pages HighMem/MovableOnly
Aug 15 04:18:56 worker-854308 kernel: [40375154.691319] 153339 pages reserved
Aug 15 04:18:56 worker-854308 kernel: [40375154.691319] 0 pages cma reserved
Aug 15 04:18:56 worker-854308 kernel: [40375154.691319] 0 pages hwpoisoned
Aug 15 04:18:56 worker-854308 kernel: [40375154.691320] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Aug 15 04:18:56 worker-854308 kernel: [40375154.691326] [  370]     0   370    44175       98      20       3        0             0 lvmetad
Aug 15 04:18:56 worker-854308 kernel: [40375154.691327] [  386]     0   386     3051      781      11       3        0             0 haveged
Aug 15 04:18:56 worker-854308 kernel: [40375154.691328] [  387]     0   387    11154     1237      25       4        0             0 systemd-journal
Aug 15 04:18:56 worker-854308 kernel: [40375154.691329] [  580]     0   580     7154       85      19       3        0             0 systemd-logind
Aug 15 04:18:56 worker-854308 kernel: [40375154.691330] [  583]   108   583    10761      179      25       4        0          -900 dbus-daemon
Aug 15 04:18:56 worker-854308 kernel: [40375154.691331] [  599]   105   599    64098      413      27       3        0             0 rsyslogd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691333] [  602]     0   602     6511       50      18       3        0             0 atd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691334] [  605]     0   605     7940       78      19       3        0             0 cron
Aug 15 04:18:56 worker-854308 kernel: [40375154.691335] [  639]     0   639     3380       76      11       3        0             0 mdadm
Aug 15 04:18:56 worker-854308 kernel: [40375154.691336] [  996]     0   996     4672       32      13       3        0             0 agetty
Aug 15 04:18:56 worker-854308 kernel: [40375154.691337] [25284]   107 25284     7727       48      18       3        0             0 uuidd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691338] [ 8621]     0  8621     1098       22       8       3        0             0 runsvdir
Aug 15 04:18:56 worker-854308 kernel: [40375154.691339] [ 8820]     0  8820     1060       19       7       3        0             0 runsv
Aug 15 04:18:56 worker-854308 kernel: [40375154.691340] [ 8821]     0  8821     1096       23       8       3        0             0 svlogd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691341] [ 9103]     0  9103     1060       18       7       3        0             0 runsv
Aug 15 04:18:56 worker-854308 kernel: [40375154.691342] [ 9104]     0  9104     1096       22       8       3        0             0 svlogd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691343] [ 9273]     0  9273     1060       18       7       3        0             0 runsv
Aug 15 04:18:56 worker-854308 kernel: [40375154.691344] [ 9274]     0  9274     1096       22       7       3        0             0 svlogd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691345] [13102]     0 13102     1060       17       7       3        0             0 runsv
Aug 15 04:18:56 worker-854308 kernel: [40375154.691346] [13103]     0 13103     1096       21       8       3        0             0 svlogd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691347] [14379]     0 14379     1060       19       7       3        0             0 runsv
Aug 15 04:18:56 worker-854308 kernel: [40375154.691348] [14411]     0 14411   158853     6883      55       5        0             0 packetbeat
Aug 15 04:18:56 worker-854308 kernel: [40375154.691350] [25850]     0 25850     1060       18       7       3        0             0 runsv
Aug 15 04:18:56 worker-854308 kernel: [40375154.691351] [25851]     0 25851     1096       22       8       3        0             0 svlogd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691352] [25852]     0 25852     1635       28      10       3        0             0 nessus-service
Aug 15 04:18:56 worker-854308 kernel: [40375154.691353] [29730]     0 29730     1060       19       7       3        0             0 runsv
Aug 15 04:18:56 worker-854308 kernel: [40375154.691354] [29731]     0 29731     1096       23       8       3        0             0 svlogd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691355] [ 9472]   109  9472    27508      160      26       3        0             0 ntpd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691356] [ 3604]     0  3604     4399     1403      13       5        0             0 factbeat
Aug 15 04:18:56 worker-854308 kernel: [40375154.691357] [ 3633]     0  3633   117963     1980      37       5        0             0 topbeat
Aug 15 04:18:56 worker-854308 kernel: [40375154.691358] [15527]     0 15527   251229     8901     106       6        0          -500 dockerd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691359] [15537]     0 15537   268398     4849      78       6        0          -500 docker-containe
Aug 15 04:18:56 worker-854308 kernel: [40375154.691360] [25964]     0 25964     1060       18       7       3        0             0 runsv
Aug 15 04:18:56 worker-854308 kernel: [40375154.691361] [25965]     0 25965     1096       22       8       3        0             0 svlogd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691362] [26487]  1106 26487     1096       19       8       3        0             0 svlogd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691363] [1048240]     0 1048240   193017     4020      67       6        0             0 filebeat
Aug 15 04:18:56 worker-854308 kernel: [40375154.691365] [2422854]     0 2422854    16378      178      34       3        0         -1000 sshd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691366] [2696372]     0 2696372    11170      171      22       3        0         -1000 systemd-udevd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691367] [673274]     0 673274   569049    12802     160       7        0             0 auditbeat
Aug 15 04:18:56 worker-854308 kernel: [40375154.691369] [941563]  1106 941563  3546049   169260     495      11        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.691370] [2313492]  1106 2313492  2293568   579428    1330      12        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.691371] [2319132]  1106 2319132  2292391   188140     568      11        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.691372] [2456594]  1106 2456594  2293901   572136    1356      11        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.691373] [3191852]  1106 3191852  2293568   545963    1296      12        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.691374] [3369237]  1106 3369237  2306172   510748    1224      11        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.691375] [4143152]     0 4143152    84939     9013      62       3        0             0 qualys-cloud-ag
Aug 15 04:18:56 worker-854308 kernel: [40375154.691376] [509199]  1106 509199  2296718   335837     922      12        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.691377] [860743]  1106 860743  2293130   540922    1291      11        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.691378] [1196891]  1106 1196891  2293862   577229    1354      12        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.691379] [1205722]  1106 1205722  2293350   576488    1342      12        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.691380] [3898646]  1106 3898646  2293095   276289     784      11        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.691381] [1988107]     0 1988107   438991     7391     102       7        0             0 metricbeat
Aug 15 04:18:56 worker-854308 kernel: [40375154.691383] [1110621]     0 1110621    76387    10462      97       4        0             0 nessusd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691384] [1248672]  1106 1248672  3052529   124774     368       9        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.691385] [1248917]  1106 1248917     5938       83      17       3        0             0 bash
Aug 15 04:18:56 worker-854308 kernel: [40375154.691386] [1248974]  1106 1248974  2035514   714527    1625      10        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.691388] [1249596]  1106 1249596   832226    19218     145       7        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.691389] [1250343]  1106 1250343    42447      675      69       4        0             0 VBoxXPCOMIPCD
Aug 15 04:18:56 worker-854308 kernel: [40375154.691390] [1286665]  1106 1286665   187023     1797     102       4        0             0 VBoxSVC
Aug 15 04:18:56 worker-854308 kernel: [40375154.691391] [1297643]     0 1297643    23201      262      49       3        0             0 sshd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691392] [1297645]  1000 1297645    11322      178      27       3        0             0 systemd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691393] [1297646]  1000 1297646    15327      497      31       3        0             0 (sd-pam)
Aug 15 04:18:56 worker-854308 kernel: [40375154.691394] [1297656]  1000 1297656    23235      259      48       3        0             0 sshd
Aug 15 04:18:56 worker-854308 kernel: [40375154.691395] [1298302]  1106 1298302     5939       78      16       3        0             0 bash
Aug 15 04:18:56 worker-854308 kernel: [40375154.691396] [1298303]  1106 1298303      908      140       6       5        0             0 vagrant
Aug 15 04:18:56 worker-854308 kernel: [40375154.691397] [1298308]  1106 1298308   138208    13389      91       3        0             0 ruby
Aug 15 04:18:56 worker-854308 kernel: [40375154.691398] [1299694]  1106 1299694  2429872   138825    4243      12        0             0 VBoxHeadless
Aug 15 04:18:56 worker-854308 kernel: [40375154.691399] Out of memory: Kill process 1248974 (java) score 87 or sacrifice child
Aug 15 04:18:56 worker-854308 kernel: [40375154.691549] Killed process 1298302 (bash) total-vm:23756kB, anon-rss:244kB, file-rss:68kB, shmem-rss:0kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.700825] EMT-1 invoked oom-killer: gfp_mask=0x16042c0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK), nodemask=(null),  order=1, oom_score_adj=0
Aug 15 04:18:56 worker-854308 kernel: [40375154.700827] EMT-1 cpuset=/ mems_allowed=0
Aug 15 04:18:56 worker-854308 kernel: [40375154.700829] CPU: 7 PID: 1299703 Comm: EMT-1 Tainted: G           OE   4.13.0-36-generic #40~16.04.1-Ubuntu
Aug 15 04:18:56 worker-854308 kernel: [40375154.700829] Hardware name: FUJITSU  /D3401-H2, BIOS V5.0.0.12 R1.14.0 for D3401-H2x                    10/24/2017
Aug 15 04:18:56 worker-854308 kernel: [40375154.700830] Call Trace:
Aug 15 04:18:56 worker-854308 kernel: [40375154.700832]  dump_stack+0x63/0x8b
Aug 15 04:18:56 worker-854308 kernel: [40375154.700834]  dump_header+0x97/0x225
Aug 15 04:18:56 worker-854308 kernel: [40375154.700836]  ? security_capable_noaudit+0x4b/0x70
Aug 15 04:18:56 worker-854308 kernel: [40375154.700837]  oom_kill_process+0x219/0x420
Aug 15 04:18:56 worker-854308 kernel: [40375154.700839]  out_of_memory+0x11d/0x4b0
Aug 15 04:18:56 worker-854308 kernel: [40375154.700840]  __alloc_pages_slowpath+0xd32/0xe10
Aug 15 04:18:56 worker-854308 kernel: [40375154.700842]  __alloc_pages_nodemask+0x263/0x280
Aug 15 04:18:56 worker-854308 kernel: [40375154.700843]  alloc_pages_current+0x6a/0xe0
Aug 15 04:18:56 worker-854308 kernel: [40375154.700845]  new_slab+0x31e/0x650
Aug 15 04:18:56 worker-854308 kernel: [40375154.700846]  ___slab_alloc+0x267/0x4c0
Aug 15 04:18:56 worker-854308 kernel: [40375154.700853]  ? rtR0MemAllocEx+0x175/0x240 [vboxdrv]
Aug 15 04:18:56 worker-854308 kernel: [40375154.700857]  ? rtR0MemAllocEx+0x175/0x240 [vboxdrv]
Aug 15 04:18:56 worker-854308 kernel: [40375154.700858]  __slab_alloc+0x20/0x40
Aug 15 04:18:56 worker-854308 kernel: [40375154.700859]  ? __slab_alloc+0x20/0x40
Aug 15 04:18:56 worker-854308 kernel: [40375154.700860]  __kmalloc+0x190/0x200
Aug 15 04:18:56 worker-854308 kernel: [40375154.700864]  rtR0MemAllocEx+0x175/0x240 [vboxdrv]
Aug 15 04:18:56 worker-854308 kernel: [40375154.700868]  VBoxHost_RTMemAllocZTag+0x2a/0x60 [vboxdrv]
Aug 15 04:18:56 worker-854308 kernel: [40375154.700871]  ? ttwu_do_wakeup+0x1e/0x150
Aug 15 04:18:56 worker-854308 kernel: [40375154.700874]  ? __kmalloc+0x163/0x200
Aug 15 04:18:56 worker-854308 kernel: [40375154.700879]  ? rtR0MemAllocEx+0x175/0x240 [vboxdrv]
Aug 15 04:18:56 worker-854308 kernel: [40375154.700882]  ? supdrvIOCtl+0x1837/0x3480 [vboxdrv]
Aug 15 04:18:56 worker-854308 kernel: [40375154.700884]  ? __check_object_size+0xfc/0x1a0
Aug 15 04:18:56 worker-854308 kernel: [40375154.700886]  ? _copy_from_user+0x36/0x70
Aug 15 04:18:56 worker-854308 kernel: [40375154.700889]  ? VBoxDrvLinuxIOCtl_5_2_28+0x160/0x250 [vboxdrv]
Aug 15 04:18:56 worker-854308 kernel: [40375154.700890]  ? do_vfs_ioctl+0xa4/0x600
Aug 15 04:18:56 worker-854308 kernel: [40375154.700892]  ? getnstimeofday64+0xe/0x20
Aug 15 04:18:56 worker-854308 kernel: [40375154.700893]  ? __audit_syscall_entry+0xaf/0xf0
Aug 15 04:18:56 worker-854308 kernel: [40375154.700894]  ? syscall_trace_enter+0x1d9/0x2f0
Aug 15 04:18:56 worker-854308 kernel: [40375154.700896]  ? SyS_ioctl+0x79/0x90
Aug 15 04:18:56 worker-854308 kernel: [40375154.700897]  ? do_syscall_64+0x61/0xd0
Aug 15 04:18:56 worker-854308 kernel: [40375154.700898]  ? entry_SYSCALL64_slow_path+0x25/0x25
Aug 15 04:18:56 worker-854308 kernel: [40375154.700899] Mem-Info:
Aug 15 04:18:56 worker-854308 kernel: [40375154.700901] active_anon:5869229 inactive_anon:50425 isolated_anon:0
Aug 15 04:18:56 worker-854308 kernel: [40375154.700901]  active_file:741 inactive_file:2377 isolated_file:0
Aug 15 04:18:56 worker-854308 kernel: [40375154.700901]  unevictable:0 dirty:29 writeback:0 unstable:0
Aug 15 04:18:56 worker-854308 kernel: [40375154.700901]  slab_reclaimable:23500 slab_unreclaimable:44444
Aug 15 04:18:56 worker-854308 kernel: [40375154.700901]  mapped:2084425 shmem:75303 pagetables:20027 bounce:0
Aug 15 04:18:56 worker-854308 kernel: [40375154.700901]  free:50657 free_pcp:14 free_cma:0
Aug 15 04:18:56 worker-854308 kernel: [40375154.700903] Node 0 active_anon:23476916kB inactive_anon:201700kB active_file:2964kB inactive_file:9508kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:8337700kB dirty:116kB writeback:0kB shmem:301212kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 14860288kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Aug 15 04:18:56 worker-854308 kernel: [40375154.700904] Node 0 DMA free:15896kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15980kB managed:15896kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.700906] lowmem_reserve[]: 0 2242 31887 31887 31887
Aug 15 04:18:56 worker-854308 kernel: [40375154.700908] Node 0 DMA32 free:123168kB min:4836kB low:7176kB high:9516kB active_anon:1761888kB inactive_anon:0kB active_file:172kB inactive_file:832kB unevictable:0kB writepending:4kB present:2408100kB managed:2342532kB mlocked:0kB kernel_stack:16kB pagetables:868kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.700910] lowmem_reserve[]: 0 0 29644 29644 29644
Aug 15 04:18:56 worker-854308 kernel: [40375154.700911] Node 0 Normal free:63564kB min:62712kB low:93072kB high:123432kB active_anon:21715028kB inactive_anon:201700kB active_file:2748kB inactive_file:8980kB unevictable:0kB writepending:332kB present:30908416kB managed:30360712kB mlocked:0kB kernel_stack:15616kB pagetables:79240kB bounce:0kB free_pcp:60kB local_pcp:0kB free_cma:0kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.700913] lowmem_reserve[]: 0 0 0 0 0
Aug 15 04:18:56 worker-854308 kernel: [40375154.700915] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15896kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.700921] Node 0 DMA32: 69*4kB (UME) 82*8kB (UME) 143*16kB (UME) 139*32kB (UME) 120*64kB (UME) 101*128kB (ME) 77*256kB (ME) 51*512kB (UME) 44*1024kB (UME) 0*2048kB 1*4096kB (H) = 123252kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.700927] Node 0 Normal: 10797*4kB (UME) 2044*8kB (UM) 242*16kB (UME) 9*32kB (UE) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 63700kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.700933] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.700933] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.700934] 78576 total pagecache pages
Aug 15 04:18:56 worker-854308 kernel: [40375154.700934] 0 pages in swap cache
Aug 15 04:18:56 worker-854308 kernel: [40375154.700935] Swap cache stats: add 0, delete 0, find 0/0
Aug 15 04:18:56 worker-854308 kernel: [40375154.700935] Free swap  = 0kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.700935] Total swap = 0kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.700936] 8333124 pages RAM
Aug 15 04:18:56 worker-854308 kernel: [40375154.700936] 0 pages HighMem/MovableOnly
Aug 15 04:18:56 worker-854308 kernel: [40375154.700936] 153339 pages reserved
Aug 15 04:18:56 worker-854308 kernel: [40375154.700937] 0 pages cma reserved
Aug 15 04:18:56 worker-854308 kernel: [40375154.700937] 0 pages hwpoisoned
Aug 15 04:18:56 worker-854308 kernel: [40375154.700937] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Aug 15 04:18:56 worker-854308 kernel: [40375154.700940] [  370]     0   370    44175       98      20       3        0             0 lvmetad
Aug 15 04:18:56 worker-854308 kernel: [40375154.700941] [  386]     0   386     3051      781      11       3        0             0 haveged
Aug 15 04:18:56 worker-854308 kernel: [40375154.700942] [  387]     0   387    11154     1229      25       4        0             0 systemd-journal
Aug 15 04:18:56 worker-854308 kernel: [40375154.700943] [  580]     0   580     7154       83      19       3        0             0 systemd-logind
Aug 15 04:18:56 worker-854308 kernel: [40375154.700945] [  583]   108   583    10761      179      25       4        0          -900 dbus-daemon
Aug 15 04:18:56 worker-854308 kernel: [40375154.700945] [  599]   105   599    64098      413      27       3        0             0 rsyslogd
Aug 15 04:18:56 worker-854308 kernel: [40375154.700947] [  602]     0   602     6511       50      18       3        0             0 atd
Aug 15 04:18:56 worker-854308 kernel: [40375154.700947] [  605]     0   605     7940       72      19       3        0             0 cron
Aug 15 04:18:56 worker-854308 kernel: [40375154.700949] [  639]     0   639     3380       76      11       3        0             0 mdadm
Aug 15 04:18:56 worker-854308 kernel: [40375154.700950] [  996]     0   996     4672       32      13       3        0             0 agetty
Aug 15 04:18:56 worker-854308 kernel: [40375154.700951] [25284]   107 25284     7727       48      18       3        0             0 uuidd
Aug 15 04:18:56 worker-854308 kernel: [40375154.700952] [ 8621]     0  8621     1098       22       8       3        0             0 runsvdir
Aug 15 04:18:56 worker-854308 kernel: [40375154.700953] [ 8820]     0  8820     1060       19       7       3        0             0 runsv
Aug 15 04:18:56 worker-854308 kernel: [40375154.700954] [ 8821]     0  8821     1096       23       8       3        0             0 svlogd
Aug 15 04:18:56 worker-854308 kernel: [40375154.700955] [ 9103]     0  9103     1060       18       7       3        0             0 runsv
Aug 15 04:18:56 worker-854308 kernel: [40375154.700956] [ 9104]     0  9104     1096       22       8       3        0             0 svlogd
Aug 15 04:18:56 worker-854308 kernel: [40375154.700957] [ 9273]     0  9273     1060       18       7       3        0             0 runsv
Aug 15 04:18:56 worker-854308 kernel: [40375154.700958] [ 9274]     0  9274     1096       22       7       3        0             0 svlogd
Aug 15 04:18:56 worker-854308 kernel: [40375154.700959] [13102]     0 13102     1060       17       7       3        0             0 runsv
Aug 15 04:18:56 worker-854308 kernel: [40375154.700960] [13103]     0 13103     1096       21       8       3        0             0 svlogd
Aug 15 04:18:56 worker-854308 kernel: [40375154.700961] [14379]     0 14379     1060       19       7       3        0             0 runsv
Aug 15 04:18:56 worker-854308 kernel: [40375154.700962] [14411]     0 14411   158853     6883      55       5        0             0 packetbeat
Aug 15 04:18:56 worker-854308 kernel: [40375154.700963] [25850]     0 25850     1060       18       7       3        0             0 runsv
Aug 15 04:18:56 worker-854308 kernel: [40375154.700964] [25851]     0 25851     1096       22       8       3        0             0 svlogd
Aug 15 04:18:56 worker-854308 kernel: [40375154.700965] [25852]     0 25852     1635       28      10       3        0             0 nessus-service
Aug 15 04:18:56 worker-854308 kernel: [40375154.700966] [29730]     0 29730     1060       19       7       3        0             0 runsv
Aug 15 04:18:56 worker-854308 kernel: [40375154.700966] [29731]     0 29731     1096       23       8       3        0             0 svlogd
Aug 15 04:18:56 worker-854308 kernel: [40375154.700968] [ 9472]   109  9472    27508      160      26       3        0             0 ntpd
Aug 15 04:18:56 worker-854308 kernel: [40375154.700968] [ 3604]     0  3604     4399     1403      13       5        0             0 factbeat
Aug 15 04:18:56 worker-854308 kernel: [40375154.700969] [ 3633]     0  3633   117963     1980      37       5        0             0 topbeat
Aug 15 04:18:56 worker-854308 kernel: [40375154.700970] [15527]     0 15527   251229     8901     106       6        0          -500 dockerd
Aug 15 04:18:56 worker-854308 kernel: [40375154.700971] [15537]     0 15537   268398     4849      78       6        0          -500 docker-containe
Aug 15 04:18:56 worker-854308 kernel: [40375154.700972] [25964]     0 25964     1060       18       7       3        0             0 runsv
Aug 15 04:18:56 worker-854308 kernel: [40375154.700973] [25965]     0 25965     1096       22       8       3        0             0 svlogd
Aug 15 04:18:56 worker-854308 kernel: [40375154.700974] [26487]  1106 26487     1096       19       8       3        0             0 svlogd
Aug 15 04:18:56 worker-854308 kernel: [40375154.700975] [1048240]     0 1048240   193017     4020      67       6        0             0 filebeat
Aug 15 04:18:56 worker-854308 kernel: [40375154.700976] [2422854]     0 2422854    16378      178      34       3        0         -1000 sshd
Aug 15 04:18:56 worker-854308 kernel: [40375154.700977] [2696372]     0 2696372    11170      163      22       3        0         -1000 systemd-udevd
Aug 15 04:18:56 worker-854308 kernel: [40375154.700978] [673274]     0 673274   569049    12802     160       7        0             0 auditbeat
Aug 15 04:18:56 worker-854308 kernel: [40375154.700980] [941563]  1106 941563  3546049   169260     495      11        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.700981] [2313492]  1106 2313492  2293568   579428    1330      12        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.700982] [2319132]  1106 2319132  2292391   188140     568      11        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.700983] [2456594]  1106 2456594  2293901   572136    1356      11        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.700983] [3191852]  1106 3191852  2293568   545963    1296      12        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.700984] [3369237]  1106 3369237  2306172   510748    1224      11        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.700985] [4143152]     0 4143152    84939     9013      62       3        0             0 qualys-cloud-ag
Aug 15 04:18:56 worker-854308 kernel: [40375154.700986] [509199]  1106 509199  2296718   335837     922      12        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.700987] [860743]  1106 860743  2293130   540922    1291      11        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.700988] [1196891]  1106 1196891  2293862   577229    1354      12        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.700989] [1205722]  1106 1205722  2293350   576488    1342      12        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.700990] [3898646]  1106 3898646  2293095   276289     784      11        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.700991] [1988107]     0 1988107   438991     7391     102       7        0             0 metricbeat
Aug 15 04:18:56 worker-854308 kernel: [40375154.700992] [1110621]     0 1110621    76387    10462      97       4        0             0 nessusd
Aug 15 04:18:56 worker-854308 kernel: [40375154.700993] [1248672]  1106 1248672  3052529   124774     368       9        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.700994] [1248917]  1106 1248917     5938       75      17       3        0             0 bash
Aug 15 04:18:56 worker-854308 kernel: [40375154.700995] [1248974]  1106 1248974  2035514   714527    1625      10        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.700996] [1249596]  1106 1249596   832226    19218     145       7        0             0 java
Aug 15 04:18:56 worker-854308 kernel: [40375154.700997] [1250343]  1106 1250343    42447      653      69       4        0             0 VBoxXPCOMIPCD
Aug 15 04:18:56 worker-854308 kernel: [40375154.700998] [1286665]  1106 1286665   187023     1764     102       4        0             0 VBoxSVC
Aug 15 04:18:56 worker-854308 kernel: [40375154.700999] [1297643]     0 1297643    23201      254      49       3        0             0 sshd
Aug 15 04:18:56 worker-854308 kernel: [40375154.701000] [1297645]  1000 1297645    11322      178      27       3        0             0 systemd
Aug 15 04:18:56 worker-854308 kernel: [40375154.701001] [1297646]  1000 1297646    15327      497      31       3        0             0 (sd-pam)
Aug 15 04:18:56 worker-854308 kernel: [40375154.701002] [1297656]  1000 1297656    23235      251      48       3        0             0 sshd
Aug 15 04:18:56 worker-854308 kernel: [40375154.701003] [1298303]  1106 1298303      908      140       6       5        0             0 vagrant
Aug 15 04:18:56 worker-854308 kernel: [40375154.701004] [1298308]  1106 1298308   138208    13381      91       3        0             0 ruby
Aug 15 04:18:56 worker-854308 kernel: [40375154.701005] [1299694]  1106 1299694  2429872   138754    4243      12        0             0 VBoxHeadless
Aug 15 04:18:56 worker-854308 kernel: [40375154.701006] Out of memory: Kill process 1248974 (java) score 87 or sacrifice child
Aug 15 04:18:56 worker-854308 kernel: [40375154.701178] Killed process 1248974 (java) total-vm:8142056kB, anon-rss:2858108kB, file-rss:0kB, shmem-rss:0kB
Aug 15 04:18:56 worker-854308 kernel: [40375154.759585] oom_reaper: reaped process 1248974 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

@alpar-t alpar-t added :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts and removed :Delivery/Build Build or test infrastructure labels Sep 5, 2019
@alpar-t alpar-t self-assigned this Oct 23, 2019
@alpar-t
Copy link
Contributor

alpar-t commented Oct 23, 2019

We are working to move these jobs to ephemeral workers using nested virtualization. I think this will solve this problem too

@jaymode
Copy link
Member

jaymode commented Nov 22, 2019

Given that nested virtualization hasn't worked out as we hoped it would, and I see some failures as recently as a few days ago I think it would be best to keep this on our radar. It is my understanding that we have reduced the current need for metal machines to only the vagrant jobs; what about requesting that we limit these jobs to metal workers that have 64gb ram?

@mark-vieira
Copy link
Contributor

what about requesting that we limit these jobs to metal workers that have 64gb ram

I don't think we have a way of making that distinction right now. Perhaps we could manually label those workers that we know are "good". That might be good enough, given this is a single job, and we run it only once a day.

@jaymode
Copy link
Member

jaymode commented Nov 22, 2019

Perhaps we could manually label those workers that we know are "good".

That's kind of what I had in mind; these would only run on machines labeled metal and 64GB (or some other appropriate label).

@alpar-t
Copy link
Contributor

alpar-t commented Nov 25, 2019

Any other machines will go unused, so it might make more sense to just remove, as in return to infra, any machine that is not 64GB

@mark-vieira
Copy link
Contributor

I'm going to close this since we are only using the metal workers for the daily vagrant tests, and that is configured now as a matrix job so memory pressure should be much less as we are only spinning up a single VM per-job. That build is unfortunately failing for other reasons, but we can reopen this if it crops up again.

@mark-vieira mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts Team:Delivery Meta label for Delivery team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

6 participants