Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soft lockup on Tempesta start #228

Closed
krizhanovsky opened this issue Sep 2, 2015 · 2 comments
Closed

Soft lockup on Tempesta start #228

krizhanovsky opened this issue Sep 2, 2015 · 2 comments
Assignees

Comments

@krizhanovsky
Copy link
Contributor

Seems related to 8a13086

On tempesta start:

     # ./tempesta.sh -f --start 
     Loading Tempesta kernel modules...
     Load Frang
     Starting Tempesta...

     Message from syslogd@localhost at Sep  2 17:29:12 ...
      kernel:BUG: soft lockup - CPU#0 stuck for 22s! [sysctl:3726]

Dmesg shows ss_connect() in call trace:

    [tdb] Start Tempesta DB
    [tempesta] Initializing Tempesta FW kernel module...
    [tempesta] init: cfg_if
    [tempesta] init: tls
    [tempesta] init: http
    [tempesta] init: http_sticky
    [tempesta] init: server
    [tempesta] init: client
    [tempesta] init: sock_srv
    [tempesta] init: sock_clnt
    [tempesta]   register cfg: filter
    [tempesta]   register cfg: cache
    [tempesta]   register cfg: http_sticky
    [tempesta]   register cfg: sock_srv
    [tempesta]   register cfg: sock_clnt
    [tempesta] Registering new scheduler: dummy
    [tempesta]   tfw_sched_hash: init
    [tempesta] Registering new scheduler: hash
    [tempesta]   tfw_sched_http: init
    [tempesta]   register cfg: tfw_sched_http
    [tempesta] Registering new scheduler: http
    [tempesta]   tfw_sched_rr: init
    [tempesta] Registering new scheduler: round-robin
    [tempesta]   register cfg: frang
    [tempesta] Warning: frang module can't be unloaded, so all allocated resources won't freed
    [tempesta] got state via sysctl: start
    [tempesta]   reading configuration file...
    [tempesta]   reading file: /root/tempesta/etc/tempesta_fw.conf
    [tempesta]   file size: 466 bytes
    [tempesta]   read by offset: 0
    [tempesta]   read by offset: 466
    [tempesta] starting all modules...
    [tempesta]   parsing configuration and pushing it to modules...
    [tempesta]   spec handle: 'server'
    [tempesta]   new server group: 'default'
    [tempesta]   spec handle: 'cache'
    [tempesta]   spec handle: 'frang_limits'
    [tempesta]   spec handle: 'request_rate'
    [tempesta]   spec handle: 'request_burst'
    [tempesta]   spec handle: 'connection_rate'
    [tempesta]   spec handle: 'concurrent_connections'
    [tempesta]   spec handle: 'client_header_timeout'
    [tempesta]   spec handle: 'client_body_timeout'
    [tempesta]   spec handle: 'http_uri_len'
    [tempesta]   spec handle: 'http_field_len'
    [tempesta]   spec handle: 'http_ct_required'
    [tempesta]   spec handle: 'http_methods'
    [tempesta]   frang: parsed method: get => 1
    [tempesta]   frang: parsed method: post => 3
    [tempesta]   frang: parsed method: head => 2
    [tempesta]   parsed methods_mask: 0xe
    [tempesta]   spec handle: 'http_ct_vals'
    [tempesta]   parsed Content-Type value: 'text/plain'
    [tempesta]   parsed Content-Type value: 'text/html'
    [tempesta]   spec handle: 'http_header_chunk_cnt'
    [tempesta]   spec handle: 'http_body_chunk_cnt'
    [tempesta]   use default entry: 'connection_burst 0;'
    [tempesta]   spec handle: 'connection_burst'
    [tempesta]   use default entry: 'http_body_len 0;'
    [tempesta]   spec handle: 'http_body_len'
    [tempesta]   use default entry: 'http_host_required false;'
    [tempesta]   spec handle: 'http_host_required'
    [tempesta]   use default entry: 'filter_tbl_size 16777216;'
    [tempesta]   spec handle: 'filter_tbl_size'
    [tempesta]   use default entry: 'filter_db /opt/tempesta/db/filter.tdb;'
    [tempesta]   spec handle: 'filter_db'
    [tempesta]   use default entry: 'cache_size 268435456;'
    [tempesta]   spec handle: 'cache_size'
    [tempesta]   use default entry: 'cache_db /opt/tempesta/db/cache.tdb;'
    [tempesta]   spec handle: 'cache_db'
    [tempesta]   use default entry: 'listen 80;'
    [tempesta]   spec handle: 'listen'
    [tempesta]   starting modules...
    [tempesta]   mod_start(): filter
    [tdb] Opened table /opt/tempesta/db/filter.tdb: size=16777216 rec_size=20 base=ffff880073000000
    [tempesta]   mod_start(): cache
    [tempesta]   mod_start(): http_sticky
    [tempesta]   mod_start(): sock_srv
    BUG: soft lockup - CPU#0 stuck for 22s! [sysctl:3726]
    Modules linked in: tfw_frang(O) tfw_sched_rr(O) tfw_sched_http(O) tfw_sched_hash(O) tfw_sched_dummy(O) tempesta_fw(O) tempesta_db(O) nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables xfs exportfs ppdev parport_pc serio_raw parport pcspkr microcode i2c_piix4 i2c_core dm_mirror dm_region_hash dm_log dm_mod uinput btrfs xor zlib_deflate raid6_pq libcrc32c ata_generic pata_acpi e1000 ata_piix floppy ipv6 autofs4
    CPU: 0 PID: 3726 Comm: sysctl Tainted: G           O 3.10.10+ #4
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
    task: ffff880079f48000 ti: ffff88007b80c000 task.ti: ffff88007b80c000
    RIP: 0010:[<ffffffff814d30c2>]  [<ffffffff814d30c2>] _raw_spin_lock+0x22/0x30
    RSP: 0018:ffff88007fc03d08  EFLAGS: 00000297
    RAX: 0000000000000000 RBX: ffff88007fc03c98 RCX: 0000000000000001
    RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88002c4c58d0
    RBP: ffff88007fc03d08 R08: 0000000000000774 R09: 00000000d76a0000
    R10: 000000000100007f R11: 0000000000000001 R12: ffff88007fc03c78
    R13: ffffffff814dbe0a R14: ffff88007fc03d08 R15: ffffffff81a96d00
    FS:  00007f34f034d740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007f34ef9d1320 CR3: 000000002c39f000 CR4: 00000000000006f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Stack:
     ffff88007fc03d80 ffffffff81461d1a 0000000000000001 ffff88007b067840
     ffffffff8143c2c0 ffff880078563000 0000d76a0000901f 0100007f00000001
     ffffffff81a96d00 ffffffff0100007f ffff88007b067840 ffffffff816699e0
    Call Trace:
     <IRQ> 
     [<ffffffff81461d1a>] tcp_v4_rcv+0x66a/0x7e0
     [<ffffffff8143c2c0>] ? ip_rcv_finish+0x320/0x320
     [<ffffffff8143c390>] ip_local_deliver_finish+0xd0/0x210
     [<ffffffff8143c6a8>] ip_local_deliver+0x88/0x90
     [<ffffffff8143c018>] ip_rcv_finish+0x78/0x320
     [<ffffffff8143c91d>] ip_rcv+0x26d/0x390
     [<ffffffff81405776>] __netif_receive_skb_core+0x5e6/0x820
     [<ffffffff814059cd>] __netif_receive_skb+0x1d/0x60
     [<ffffffff814064b0>] process_backlog+0xa0/0x160
     [<ffffffff81405d79>] net_rx_action+0x139/0x220
     [<ffffffff810461f0>] __do_softirq+0x100/0x250
     [<ffffffff814dc47c>] call_softirq+0x1c/0x30
     <EOI> 
     [<ffffffff810041ed>] do_softirq+0x3d/0x80
     [<ffffffff81045814>] local_bh_enable+0x94/0xa0
     [<ffffffff81440828>] ip_finish_output+0x1b8/0x3b0
     [<ffffffff81441d70>] ip_output+0x90/0xa0
     [<ffffffff81441495>] ip_local_out+0x25/0x30
     [<ffffffff814417eb>] ip_queue_xmit+0x14b/0x3f0
     [<ffffffff81458f2d>] tcp_transmit_skb+0x45d/0x8c0
     [<ffffffff8145ba96>] tcp_connect+0x4f6/0x5d0
     [<ffffffff8108ecbe>] ? getnstimeofday+0xe/0x30
     [<ffffffff8108ed46>] ? ktime_get_real+0x16/0x50
     [<ffffffff8145f1aa>] tcp_v4_connect+0x31a/0x480
     [<ffffffffa046f61e>] ss_connect+0xcb/0xf2 [tempesta_fw]
     [<ffffffffa0470735>] tfw_sock_srv_connect_try+0xb8/0xfb [tempesta_fw]
     [<ffffffffa0470cc0>] tfw_sock_srv_connect_srv+0x36/0x76 [tempesta_fw]
     [<ffffffffa046c9e1>] tfw_sg_for_each_srv+0x73/0xff [tempesta_fw]
     [<ffffffffa0470c8a>] ? tfw_sock_srv_disconnect+0x95/0x95 [tempesta_fw]
     [<ffffffffa0470d85>] tfw_sock_srv_connect_all+0x15/0x17 [tempesta_fw]
     [<ffffffffa0448830>] mod_start+0x73/0xa0 [tempesta_fw]
     [<ffffffffa0448ce3>] tfw_cfg_start_mods+0xf5/0x1ef [tempesta_fw]
     [<ffffffffa04491e9>] handle_state_change+0x102/0x1a2 [tempesta_fw]
     [<ffffffffa04493d8>] handle_sysctl_state_io+0x14f/0x1ba [tempesta_fw]
     [<ffffffffa04492dd>] ? handle_sysctl_state_io+0x54/0x1ba [tempesta_fw]
     [<ffffffff811c1813>] proc_sys_call_handler+0xb3/0xc0
     [<ffffffff811c1813>] ? proc_sys_call_handler+0xb3/0xc0
     [<ffffffff811c1834>] proc_sys_write+0x14/0x20
     [<ffffffff81155f7d>] vfs_write+0xbd/0x1e0
     [<ffffffff81156949>] SyS_write+0x49/0xa0
     [<ffffffff814db242>] system_call_fastpath+0x16/0x1b
    Code: 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 b8 00 00 01 00 f0 0f c1 07 89 c1 c1 e9 10 66 39 c1 89 ca 74 0c 66 90 f3 90 <0f> b7 07 66 39 d0 75 f6 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 
@krizhanovsky
Copy link
Contributor Author

Actually ss_connect() should not compete for the socket with other socket functions and the lock adjusted in 8a13086 could be removed at all. In fact original Linux code uses lock_sock() which normally works being preempted by softirq (the case for this issue).

However, currently socket failovering code firstly add the socket pointer to conn->sk and tfw_sg_update() decides the socket as alive while it's still in progress to establish a new connection. Thus ss_send() and ss_close() can be called concurrently with opening a new connection. The first call on half-open connection could be eliminated by more clever checking of alive connections.

The second call in much more complex. Connection establishing is asynchronous and we should close half-open socket on Tempesta shutdown. The problem could be solved by two phase shutdown process:

  1. stop all services, unregister any callbacks and wait for all sortirq running Tempesta code stop. If ss_tcp_state_change() is called in softirq for ESTABLISHED connection, then the socket should be closed immediately in the same function;
  2. now all sockets for server connection pool are fully open or closed, so we can free all resources and close the sockets.

@krizhanovsky
Copy link
Contributor Author

Actually we need synchronous connection closing interface due to #110 and #116 requirements (see #116 (comment)). The issue is fixed except correct Tempesta shutdown case. The rest should be fixed in #116 and #254.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants