DLPX-83697 iscsi target login should wait until tx/rx threads have properly started #21

pcd1193182 · 2022-11-09T18:14:07Z

This is a cherry-pick of an upstream linux kernel patch that is still in development. Because the issue being tested is difficult to reproduce, the patch has not been positively confirmed to resolve the issue; however, it looks like it should, according to our best theory of what is causing the bug. By integrating the patch now we can give it ample soak time before it goes into a release.

http://selfservice.jenkins.delphix.com/job/appliance-build-orchestrator-pre-push/3737

sdimitro

Could we add a link to the upstream discussion in the commit message? Just for future reference?

…21)

BugLink: https://bugs.launchpad.net/bugs/2003914 [ Upstream commit 9de255c ] Commit b743512 ("uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol()") started calling disable_irq() without holding the spinlock because it can sleep. However, that fix introduced another bug: if interrupt is already disabled and a new disable request comes in, then the spinlock is not unlocked: root@localhost:~# printf '\x00\x00\x00\x00' > /dev/uio0 root@localhost:~# printf '\x00\x00\x00\x00' > /dev/uio0 root@localhost:~# [ 14.851538] BUG: scheduling while atomic: bash/223/0x00000002 [ 14.851991] Modules linked in: uio_dmem_genirq uio myfpga(OE) bochs drm_vram_helper drm_ttm_helper ttm drm_kms_helper drm snd_pcm ppdev joydev psmouse snd_timer snd e1000fb_sys_fops syscopyarea parport sysfillrect soundcore sysimgblt input_leds pcspkr i2c_piix4 serio_raw floppy evbug qemu_fw_cfg mac_hid pata_acpi ip_tables x_tables autofs4 [last unloaded: parport_pc] [ 14.854206] CPU: 0 PID: 223 Comm: bash Tainted: G OE 6.0.0-rc7 #21 [ 14.854786] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 14.855664] Call Trace: [ 14.855861] <TASK> [ 14.856025] dump_stack_lvl+0x4d/0x67 [ 14.856325] dump_stack+0x14/0x1a [ 14.856583] __schedule_bug.cold+0x4b/0x5c [ 14.856915] __schedule+0xe81/0x13d0 [ 14.857199] ? idr_find+0x13/0x20 [ 14.857456] ? get_work_pool+0x2d/0x50 [ 14.857756] ? __flush_work+0x233/0x280 [ 14.858068] ? __schedule+0xa95/0x13d0 [ 14.858307] ? idr_find+0x13/0x20 [ 14.858519] ? get_work_pool+0x2d/0x50 [ 14.858798] schedule+0x6c/0x100 [ 14.859009] schedule_hrtimeout_range_clock+0xff/0x110 [ 14.859335] ? tty_write_room+0x1f/0x30 [ 14.859598] ? n_tty_poll+0x1ec/0x220 [ 14.859830] ? tty_ldisc_deref+0x1a/0x20 [ 14.860090] schedule_hrtimeout_range+0x17/0x20 [ 14.860373] do_select+0x596/0x840 [ 14.860627] ? __kernel_text_address+0x16/0x50 [ 14.860954] ? poll_freewait+0xb0/0xb0 [ 14.861235] ? poll_freewait+0xb0/0xb0 [ 14.861517] ? rpm_resume+0x49d/0x780 [ 14.861798] ? common_interrupt+0x59/0xa0 [ 14.862127] ? asm_common_interrupt+0x2b/0x40 [ 14.862511] ? __uart_start.isra.0+0x61/0x70 [ 14.862902] ? __check_object_size+0x61/0x280 [ 14.863255] core_sys_select+0x1c6/0x400 [ 14.863575] ? vfs_write+0x1c9/0x3d0 [ 14.863853] ? vfs_write+0x1c9/0x3d0 [ 14.864121] ? _copy_from_user+0x45/0x70 [ 14.864526] do_pselect.constprop.0+0xb3/0xf0 [ 14.864893] ? do_syscall_64+0x6d/0x90 [ 14.865228] ? do_syscall_64+0x6d/0x90 [ 14.865556] __x64_sys_pselect6+0x76/0xa0 [ 14.865906] do_syscall_64+0x60/0x90 [ 14.866214] ? syscall_exit_to_user_mode+0x2a/0x50 [ 14.866640] ? do_syscall_64+0x6d/0x90 [ 14.866972] ? do_syscall_64+0x6d/0x90 [ 14.867286] ? do_syscall_64+0x6d/0x90 [ 14.867626] entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] stripped [ 14.872959] </TASK> ('myfpga' is a simple 'uio_dmem_genirq' driver I wrote to test this) The implementation of "uio_dmem_genirq" was based on "uio_pdrv_genirq" and it is used in a similar manner to the "uio_pdrv_genirq" driver with respect to interrupt configuration and handling. At the time "uio_dmem_genirq" was introduced, both had the same implementation of the 'uio_info' handlers irqcontrol() and handler(). Then commit 34cb275 ("UIO: Fix concurrency issue"), which was only applied to "uio_pdrv_genirq", ended up making them a little different. That commit, among other things, changed disable_irq() to disable_irq_nosync() in the implementation of irqcontrol(). The motivation there was to avoid a deadlock between irqcontrol() and handler(), since it added a spinlock in the irq handler, and disable_irq() waits for the completion of the irq handler. By changing disable_irq() to disable_irq_nosync() in irqcontrol(), we also avoid the sleeping-while-atomic bug that commit b743512 ("uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol()") was trying to fix. Thus, this fixes the missing unlock in irqcontrol() by importing the implementation of irqcontrol() handler from the "uio_pdrv_genirq" driver. In the end, it reverts commit b743512 ("uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol()") and change disable_irq() to disable_irq_nosync(). It is worth noting that this still does not address the concurrency issue fixed by commit 34cb275 ("UIO: Fix concurrency issue"). It will be addressed separately in the next commits. Split out from commit 34cb275 ("UIO: Fix concurrency issue"). Fixes: b743512 ("uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol()") Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Link: https://lore.kernel.org/r/20220930224100.816175-2-rafaelmendsr@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

…21)

BugLink: https://bugs.launchpad.net/bugs/2073765 commit be346c1a6eeb49d8fda827d2a9522124c2f72f36 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] #11 dio_complete at ffffffff8c2b9fa7 #12 do_blockdev_direct_IO at ffffffff8c2bc09f #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] #14 generic_file_direct_write at ffffffff8c1dcf14 #15 __generic_file_write_iter at ffffffff8c1dd07b #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] #17 aio_write at ffffffff8c2cc72e #18 kmem_cache_alloc at ffffffff8c248dde #19 do_io_submit at ffffffff8c2ccada #20 do_syscall_64 at ffffffff8c004984 #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>

…21)

BugLink: https://bugs.launchpad.net/bugs/2076435 commit be346c1a6eeb49d8fda827d2a9522124c2f72f36 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] #11 dio_complete at ffffffff8c2b9fa7 #12 do_blockdev_direct_IO at ffffffff8c2bc09f #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] #14 generic_file_direct_write at ffffffff8c1dcf14 #15 __generic_file_write_iter at ffffffff8c1dd07b #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] #17 aio_write at ffffffff8c2cc72e #18 kmem_cache_alloc at ffffffff8c248dde #19 do_io_submit at ffffffff8c2ccada #20 do_syscall_64 at ffffffff8c004984 #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>

…21)

target: login should wait until tx/rx threads have properly started.

95c519a

pcd1193182 requested review from sebroy, sdimitro and sumedhbala-delphix November 9, 2022 18:14

sdimitro approved these changes Nov 9, 2022

View reviewed changes

pcd1193182 changed the title ~~target: login should wait until tx/rx threads have properly started.~~ DLPX-83697 iscsi target login should wait until tx/rx threads have properly started Nov 9, 2022

don-brady approved these changes Nov 14, 2022

View reviewed changes

Merge branch '6.0/stage' into iscsi_aws

9fad915

pcd1193182 enabled auto-merge (squash) November 14, 2022 21:33

pcd1193182 disabled auto-merge November 14, 2022 21:34

pcd1193182 enabled auto-merge (squash) November 14, 2022 21:34

pcd1193182 merged commit a4ad754 into delphix:6.0/stage Nov 14, 2022

delphix-devops-bot pushed a commit that referenced this pull request Nov 17, 2022

target: login should wait until tx/rx threads have properly started. (#…

18a0757

…21)

delphix-devops-bot pushed a commit that referenced this pull request Nov 18, 2022

target: login should wait until tx/rx threads have properly started. (#…

596ac27

…21)

delphix-devops-bot pushed a commit that referenced this pull request Dec 15, 2022

target: login should wait until tx/rx threads have properly started. (#…

bc7a4ba

…21)

delphix-devops-bot pushed a commit that referenced this pull request Jan 7, 2023

target: login should wait until tx/rx threads have properly started. (#…

c5751d9

…21)

delphix-devops-bot pushed a commit that referenced this pull request Jan 16, 2023

target: login should wait until tx/rx threads have properly started. (#…

108a7a6

…21)

delphix-devops-bot pushed a commit that referenced this pull request Feb 10, 2023

target: login should wait until tx/rx threads have properly started. (#…

655669f

…21)

delphix-devops-bot pushed a commit that referenced this pull request Mar 4, 2023

target: login should wait until tx/rx threads have properly started. (#…

3917c9d

…21)

prakashsurya pushed a commit that referenced this pull request Mar 11, 2023

target: login should wait until tx/rx threads have properly started. (#…

ee6f2f2

…21)

prakashsurya pushed a commit that referenced this pull request Mar 14, 2023

target: login should wait until tx/rx threads have properly started. (#…

e251563

…21)

prakashsurya pushed a commit that referenced this pull request Mar 14, 2023

target: login should wait until tx/rx threads have properly started. (#…

ec11710

…21)

delphix-devops-bot pushed a commit that referenced this pull request Mar 30, 2023

target: login should wait until tx/rx threads have properly started. (#…

c1d1f4a

…21)

delphix-devops-bot pushed a commit that referenced this pull request Apr 20, 2023

target: login should wait until tx/rx threads have properly started. (#…

7cea709

…21)

delphix-devops-bot pushed a commit that referenced this pull request Apr 28, 2023

target: login should wait until tx/rx threads have properly started. (#…

6921143

…21)

delphix-devops-bot pushed a commit that referenced this pull request May 31, 2023

target: login should wait until tx/rx threads have properly started. (#…

7527ac9

…21)

delphix-devops-bot pushed a commit that referenced this pull request Jun 3, 2023

target: login should wait until tx/rx threads have properly started. (#…

aed2ff8

…21)

delphix-devops-bot pushed a commit that referenced this pull request Jun 4, 2023

target: login should wait until tx/rx threads have properly started. (#…

08842c6

…21)

delphix-devops-bot pushed a commit that referenced this pull request Jun 5, 2023

target: login should wait until tx/rx threads have properly started. (#…

4581b98

…21)

delphix-devops-bot pushed a commit that referenced this pull request Aug 22, 2024

target: login should wait until tx/rx threads have properly started. (#…

b20cfa3

…21)

delphix-devops-bot pushed a commit that referenced this pull request Aug 23, 2024

target: login should wait until tx/rx threads have properly started. (#…

c76ea4b

…21)

prakashsurya pushed a commit that referenced this pull request Sep 23, 2024

target: login should wait until tx/rx threads have properly started. (#…

7c13578

…21)

delphix-devops-bot pushed a commit that referenced this pull request Oct 20, 2024

target: login should wait until tx/rx threads have properly started. (#…

c09c3e1

…21)

delphix-devops-bot pushed a commit that referenced this pull request Oct 21, 2024

target: login should wait until tx/rx threads have properly started. (#…

5c4f4c4

…21)

palash-gandhi pushed a commit that referenced this pull request Oct 24, 2024

target: login should wait until tx/rx threads have properly started. (#…

52012ae

…21)

delphix-devops-bot pushed a commit that referenced this pull request Nov 10, 2024

target: login should wait until tx/rx threads have properly started. (#…

0d09878

…21)

delphix-devops-bot pushed a commit that referenced this pull request Dec 18, 2024

target: login should wait until tx/rx threads have properly started. (#…

7382379

…21)

delphix-devops-bot pushed a commit that referenced this pull request Dec 19, 2024

target: login should wait until tx/rx threads have properly started. (#…

500b272

…21)

delphix-devops-bot pushed a commit that referenced this pull request Dec 20, 2024

target: login should wait until tx/rx threads have properly started. (#…

c94fd9f

…21)

delphix-devops-bot pushed a commit that referenced this pull request Dec 21, 2024

target: login should wait until tx/rx threads have properly started. (#…

0227ae0

…21)

delphix-devops-bot pushed a commit that referenced this pull request Dec 22, 2024

target: login should wait until tx/rx threads have properly started. (#…

c024283

…21)

delphix-devops-bot pushed a commit that referenced this pull request Dec 23, 2024

target: login should wait until tx/rx threads have properly started. (#…

78a3e82

…21)

delphix-devops-bot pushed a commit that referenced this pull request Dec 24, 2024

target: login should wait until tx/rx threads have properly started. (#…

b268b52

…21)

delphix-devops-bot pushed a commit that referenced this pull request Dec 25, 2024

target: login should wait until tx/rx threads have properly started. (#…

400064a

…21)

delphix-devops-bot pushed a commit that referenced this pull request Dec 26, 2024

target: login should wait until tx/rx threads have properly started. (#…

7ef90cb

…21)

delphix-devops-bot pushed a commit that referenced this pull request Dec 27, 2024

target: login should wait until tx/rx threads have properly started. (#…

6a49c06

…21)

delphix-devops-bot pushed a commit that referenced this pull request Dec 28, 2024

target: login should wait until tx/rx threads have properly started. (#…

d3008f8

…21)

delphix-devops-bot pushed a commit that referenced this pull request Dec 29, 2024

target: login should wait until tx/rx threads have properly started. (#…

1de86cb

…21)

delphix-devops-bot pushed a commit that referenced this pull request Dec 30, 2024

target: login should wait until tx/rx threads have properly started. (#…

6e641d1

…21)

delphix-devops-bot pushed a commit that referenced this pull request Dec 31, 2024

target: login should wait until tx/rx threads have properly started. (#…

289b26a

…21)

delphix-devops-bot pushed a commit that referenced this pull request Jan 1, 2025

target: login should wait until tx/rx threads have properly started. (#…

70ed394

…21)

delphix-devops-bot pushed a commit that referenced this pull request Jan 2, 2025

target: login should wait until tx/rx threads have properly started. (#…

24b8314

…21)

delphix-devops-bot pushed a commit that referenced this pull request Jan 3, 2025

target: login should wait until tx/rx threads have properly started. (#…

4472b55

…21)

delphix-devops-bot pushed a commit that referenced this pull request Jan 4, 2025

target: login should wait until tx/rx threads have properly started. (#…

dcf6c3b

…21)

delphix-devops-bot pushed a commit that referenced this pull request Jan 5, 2025

target: login should wait until tx/rx threads have properly started. (#…

05f09f3

…21)

delphix-devops-bot pushed a commit that referenced this pull request Jan 7, 2025

target: login should wait until tx/rx threads have properly started. (#…

d99f089

…21)

delphix-devops-bot pushed a commit that referenced this pull request Feb 12, 2025

target: login should wait until tx/rx threads have properly started. (#…

a34bb13

…21)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DLPX-83697 iscsi target login should wait until tx/rx threads have properly started #21

DLPX-83697 iscsi target login should wait until tx/rx threads have properly started #21

pcd1193182 commented Nov 9, 2022

sdimitro left a comment

DLPX-83697 iscsi target login should wait until tx/rx threads have properly started #21

DLPX-83697 iscsi target login should wait until tx/rx threads have properly started #21

Conversation

pcd1193182 commented Nov 9, 2022

sdimitro left a comment

Choose a reason for hiding this comment