-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test DPDK driver patch #3
Labels
Comments
commit a198a4d provide an enhancement. |
thehajime
pushed a commit
that referenced
this issue
Apr 8, 2015
Fixed two warnings in e1000e and igb, when switching to timespec64 some printf formats started to not match. In theses cases actually the new type is __kernel_time_t which is __kernel_long_t which unfortunately can be either "long" or "long long". So to solve this I cases the arguments to "long long". -DaveM Richard Cochran says: ==================== ptp: get ready for 2038 This series converts the core driver methods of the PTP Hardware Clock (PHC) subsystem to use the 64 bit version of the timespec structure, making the core API ready for the year 2038. In addition, I reviewed how each driver and device represents the time value at the hardware register level. Most of the drivers are ready, but a few will need some work before the year 2038, as shown: Patch Driver ------------------------------------------------ 12 drivers/net/ethernet/intel/igb/igb_ptp.c 15 ? drivers/net/ethernet/sfc/ptp.c 16 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c The commit log messages document how each driver is ready or why it is not ready. For patch 15, I could not easily find out the hardware representation of the time value, and so the SFC maintainers will have to review their low level code in order to resolve any remaining issues. * ChangeLog ** V3 - dp83640: use timespec64 throughout per Arnd's suggestion - tilegx: use timespec64 throughout per Chris' suggestion - add Jeff's acked-bys ** V2 - use the new methods in the posix clock code right away (patch #3) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
thehajime
pushed a commit
that referenced
this issue
Apr 8, 2015
Andrew Lunn says: ==================== DSA Mavell drivers refactoring and cleanup v1->v2: * Add missing signed-of-by: For patches authored by Guenter Roeck. * Add Reviewed by from Guenter Roack to patch #5. This is a collection of patches again net-next from today containing refactoring and consolidate of code, cleanups and using #define's to replace register numbers. Patch #1 Swaps the 6131 driver to use the consolidated setup code. Patch #2 Moves the Switch IDs used during probe into a central location. We need these later so that we can differentiate the different features the devices have. Patch #3 Makes the 6131 driver set the number of ports in the private state structure. It then uses this, rather than hard coded maximum number of ports. Patch #4 Similar to Patch #3, but for the 6123_61_65 driver. Patch #5 Similar to Patch #3, and #4, but for all the remaining drivers. This greatly increases the similarity of the code between drivers, allow further patches to consolidate the duplicated code. Patch #6 Consolidate the switch reset code, which has two minor variants. Removes around 35 lines per driver. Patch #7 Moves phy page access functions out of the 6352 driver into the shared code. Currently only the 6352 driver uses this, but it is likely other devices will come along wanting this functionality. Patch #8 Consolidates the code used to access phy registers. Removes around 40 lines of code per driver. Patch #9 Fixes missing mutex locking in the EEE code, and refactors the code a bit to make it more understandable with respect to locks. Patch #10 Consolidates reading statistics. This is very similar code for all devices, but the number of available statistics differ, which can be determined from the product ID. Removes around 65 lines per driver. Patch #11 Add #defines for registers, and bits within the registers. For the moment, this is limited to the shared code. The individual drivers will be converted once the remaining duplicated code is consolidated Patch #12 Fix broken statistic counters on the 6172. The 6352 family requires the port number is poked into a different set of bits in the register compared to other devices. Many thanks to Guenter Roeck for repeatedly reviewing the patches and testing them on his hardware. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
thehajime
pushed a commit
that referenced
this issue
Apr 22, 2015
…_ram allocator Recently I came across high fragmentation of vm_map_ram allocator: vmap_block has free space, but still new blocks continue to appear. Further investigation showed that certain mapping/unmapping sequences can exhaust vmalloc space. On small 32bit systems that's not a big problem, cause purging will be called soon on a first allocation failure (alloc_vmap_area), but on 64bit machines, e.g. x86_64 has 45 bits of vmalloc space, that can be a disaster. 1) I came up with a simple allocation sequence, which exhausts virtual space very quickly: while (iters) { /* Map/unmap big chunk */ vaddr = vm_map_ram(pages, 16, -1, PAGE_KERNEL); vm_unmap_ram(vaddr, 16); /* Map/unmap small chunks. * * -1 for hole, which should be left at the end of each block * to keep it partially used, with some free space available */ for (i = 0; i < (VMAP_BBMAP_BITS - 16) / 8 - 1; i++) { vaddr = vm_map_ram(pages, 8, -1, PAGE_KERNEL); vm_unmap_ram(vaddr, 8); } } The idea behind is simple: 1. We have to map a big chunk, e.g. 16 pages. 2. Then we have to occupy the remaining space with smaller chunks, i.e. 8 pages. At the end small hole should remain to keep block in free list, but do not let big chunk to occupy remaining space. 3. Goto 1 - allocation request of 16 pages can't be completed (only 8 slots are left free in the block in the #2 step), new block will be allocated, all further requests will lay into newly allocated block. To have some measurement numbers for all further tests I setup ftrace and enabled 4 basic calls in a function profile: echo vm_map_ram > /sys/kernel/debug/tracing/set_ftrace_filter; echo alloc_vmap_area >> /sys/kernel/debug/tracing/set_ftrace_filter; echo vm_unmap_ram >> /sys/kernel/debug/tracing/set_ftrace_filter; echo free_vmap_block >> /sys/kernel/debug/tracing/set_ftrace_filter; So for this scenario I got these results: BEFORE (all new blocks are put to the head of a free list) # cat /sys/kernel/debug/tracing/trace_stat/function0 Function Hit Time Avg s^2 -------- --- ---- --- --- vm_map_ram 126000 30683.30 us 0.243 us 30819.36 us vm_unmap_ram 126000 22003.24 us 0.174 us 340.886 us alloc_vmap_area 1000 4132.065 us 4.132 us 0.903 us AFTER (all new blocks are put to the tail of a free list) # cat /sys/kernel/debug/tracing/trace_stat/function0 Function Hit Time Avg s^2 -------- --- ---- --- --- vm_map_ram 126000 28713.13 us 0.227 us 24944.70 us vm_unmap_ram 126000 20403.96 us 0.161 us 1429.872 us alloc_vmap_area 993 3916.795 us 3.944 us 29.370 us free_vmap_block 992 654.157 us 0.659 us 1.273 us SUMMARY: The most interesting numbers in those tables are numbers of block allocations and deallocations: alloc_vmap_area and free_vmap_block calls, which show that before the change blocks were not freed, and virtual space and physical memory (vmap_block structure allocations, etc) were consumed. Average time which were spent in vm_map_ram/vm_unmap_ram became slightly better. That can be explained with a reasonable amount of blocks in a free list, which we need to iterate to find a suitable free block. 2) Another scenario is a random allocation: while (iters) { /* Randomly take number from a range [1..32/64] */ nr = rand(1, VMAP_MAX_ALLOC); vaddr = vm_map_ram(pages, nr, -1, PAGE_KERNEL); vm_unmap_ram(vaddr, nr); } I chose mersenne twister PRNG to generate persistent random state to guarantee that both runs have the same random sequence. For each vm_map_ram call random number from [1..32/64] was taken to represent amount of pages which I do map. I did 10'000 vm_map_ram calls and got these two tables: BEFORE (all new blocks are put to the head of a free list) # cat /sys/kernel/debug/tracing/trace_stat/function0 Function Hit Time Avg s^2 -------- --- ---- --- --- vm_map_ram 10000 10170.01 us 1.017 us 993.609 us vm_unmap_ram 10000 5321.823 us 0.532 us 59.789 us alloc_vmap_area 420 2150.239 us 5.119 us 3.307 us free_vmap_block 37 159.587 us 4.313 us 134.344 us AFTER (all new blocks are put to the tail of a free list) # cat /sys/kernel/debug/tracing/trace_stat/function0 Function Hit Time Avg s^2 -------- --- ---- --- --- vm_map_ram 10000 7745.637 us 0.774 us 395.229 us vm_unmap_ram 10000 5460.573 us 0.546 us 67.187 us alloc_vmap_area 414 2201.650 us 5.317 us 5.591 us free_vmap_block 412 574.421 us 1.394 us 15.138 us SUMMARY: 'BEFORE' table shows, that 420 blocks were allocated and only 37 were freed. Remained 383 blocks are still in a free list, consuming virtual space and physical memory. 'AFTER' table shows, that 414 blocks were allocated and 412 were really freed. 2 blocks remained in a free list. So fragmentation was dramatically reduced. Why? Because when we put newly allocated block to the head, all further requests will occupy new block, regardless remained space in other blocks. In this scenario all requests come randomly. Eventually remained free space will be less than requested size, free list will be iterated and it is possible that nothing will be found there - finally new block will be created. So exhaustion in random scenario happens for the maximum possible allocation size: 32 pages for 32-bit system and 64 pages for 64-bit system. Also average cost of vm_map_ram was reduced from 1.017 us to 0.774 us. Again this can be explained by iteration through smaller list of free blocks. 3) Next simple scenario is a sequential allocation, when the allocation order is increased for each block. This scenario forces allocator to reach maximum amount of partially free blocks in a free list: while (iters) { /* Populate free list with blocks with remaining space */ for (order = 0; order <= ilog2(VMAP_MAX_ALLOC); order++) { nr = VMAP_BBMAP_BITS / (1 << order); /* Leave a hole */ nr -= 1; for (i = 0; i < nr; i++) { vaddr = vm_map_ram(pages, (1 << order), -1, PAGE_KERNEL); vm_unmap_ram(vaddr, (1 << order)); } /* Completely occupy blocks from a free list */ for (order = 0; order <= ilog2(VMAP_MAX_ALLOC); order++) { vaddr = vm_map_ram(pages, (1 << order), -1, PAGE_KERNEL); vm_unmap_ram(vaddr, (1 << order)); } } Results which I got: BEFORE (all new blocks are put to the head of a free list) # cat /sys/kernel/debug/tracing/trace_stat/function0 Function Hit Time Avg s^2 -------- --- ---- --- --- vm_map_ram 2032000 399545.2 us 0.196 us 467123.7 us vm_unmap_ram 2032000 363225.7 us 0.178 us 111405.9 us alloc_vmap_area 7001 30627.76 us 4.374 us 495.755 us free_vmap_block 6993 7011.685 us 1.002 us 159.090 us AFTER (all new blocks are put to the tail of a free list) # cat /sys/kernel/debug/tracing/trace_stat/function0 Function Hit Time Avg s^2 -------- --- ---- --- --- vm_map_ram 2032000 394259.7 us 0.194 us 589395.9 us vm_unmap_ram 2032000 292500.7 us 0.143 us 94181.08 us alloc_vmap_area 7000 31103.11 us 4.443 us 703.225 us free_vmap_block 7000 6750.844 us 0.964 us 119.112 us SUMMARY: No surprises here, almost all numbers are the same. Fixing this fragmentation problem I also did some improvements in a allocation logic of a new vmap block: occupy block immediately and get rid of extra search in a free list. Also I replaced dirty bitmap with min/max dirty range values to make the logic simpler and slightly faster, since two longs comparison costs less, than loop thru bitmap. This patchset raises several questions: Q: Think the problem you comments is already known so that I wrote comments about it as "it could consume lots of address space through fragmentation". Could you tell me about your situation and reason why it should be avoided? Gioh Kim A: Indeed, there was a commit 3643763 which adds explicit comment about fragmentation. But fragmentation which is described in this comment caused by mixing of long-lived and short-lived objects, when a whole block is pinned in memory because some page slots are still in use. But here I am talking about blocks which are free, nobody uses them, and allocator keeps them alive forever, continuously allocating new blocks. Q: I think that if you put newly allocated block to the tail of a free list, below example would results in enormous performance degradation. new block: 1MB (256 pages) while (iters--) { vm_map_ram(3 or something else not dividable for 256) * 85 vm_unmap_ram(3) * 85 } On every iteration, it needs newly allocated block and it is put to the tail of a free list so finding it consumes large amount of time. Joonsoo Kim A: Second patch in current patchset gets rid of extra search in a free list, so new block will be immediately occupied.. Also, the scenario above is impossible, cause vm_map_ram allocates virtual range in orders, i.e. 2^n. I.e. passing 3 to vm_map_ram you will allocate 4 slots in a block and 256 slots (capacity of a block) of course dividable on 4, so block will be completely occupied. But there is a worst case which we can achieve: each free block has a hole equal to order size. The maximum size of allocation is 64 pages for 64-bit system (if you try to map more, original alloc_vmap_area will be called). So the maximum order is 6. That means that worst case, before allocator makes a decision to allocate a new block, is to iterate 7 blocks: HEAD 1st block - has 1 page slot free (order 0) 2nd block - has 2 page slots free (order 1) 3rd block - has 4 page slots free (order 2) 4th block - has 8 page slots free (order 3) 5th block - has 16 page slots free (order 4) 6th block - has 32 page slots free (order 5) 7th block - has 64 page slots free (order 6) TAIL So the worst scenario on 64-bit system is that each CPU queue can have 7 blocks in a free list. This can happen only and only if you allocate blocks increasing the order. (as I did in the function written in the comment of the first patch) This is weird and rare case, but still it is possible. Afterwards you will get 7 blocks in a list. All further requests should be placed in a newly allocated block or some free slots should be found in a free list. Seems it does not look dramatically awful. This patch (of 3): If suitable block can't be found, new block is allocated and put into a head of a free list, so on next iteration this new block will be found first. That's bad, because old blocks in a free list will not get a chance to be fully used, thus fragmentation will grow. Let's consider this simple example: #1 We have one block in a free list which is partially used, and where only one page is free: HEAD |xxxxxxxxx-| TAIL ^ free space for 1 page, order 0 #2 New allocation request of order 1 (2 pages) comes, new block is allocated since we do not have free space to complete this request. New block is put into a head of a free list: HEAD |----------|xxxxxxxxx-| TAIL #3 Two pages were occupied in a new found block: HEAD |xx--------|xxxxxxxxx-| TAIL ^ two pages mapped here #4 New allocation request of order 0 (1 page) comes. Block, which was created on #2 step, is located at the beginning of a free list, so it will be found first: HEAD |xxX-------|xxxxxxxxx-| TAIL ^ ^ page mapped here, but better to use this hole It is obvious, that it is better to complete request of #4 step using the old block, where free space is left, because in other case fragmentation will be highly increased. But fragmentation is not only the case. The worst thing is that I can easily create scenario, when the whole vmalloc space is exhausted by blocks, which are not used, but already dirty and have several free pages. Let's consider this function which execution should be pinned to one CPU: static void exhaust_virtual_space(struct page *pages[16], int iters) { /* Firstly we have to map a big chunk, e.g. 16 pages. * Then we have to occupy the remaining space with smaller * chunks, i.e. 8 pages. At the end small hole should remain. * So at the end of our allocation sequence block looks like * this: * XX big chunk * |XXxxxxxxx-| x small chunk * - hole, which is enough for a small chunk, * but is not enough for a big chunk */ while (iters--) { int i; void *vaddr; /* Map/unmap big chunk */ vaddr = vm_map_ram(pages, 16, -1, PAGE_KERNEL); vm_unmap_ram(vaddr, 16); /* Map/unmap small chunks. * * -1 for hole, which should be left at the end of each block * to keep it partially used, with some free space available */ for (i = 0; i < (VMAP_BBMAP_BITS - 16) / 8 - 1; i++) { vaddr = vm_map_ram(pages, 8, -1, PAGE_KERNEL); vm_unmap_ram(vaddr, 8); } } } On every iteration new block (1MB of vm area in my case) will be allocated and then will be occupied, without attempt to resolve small allocation request using previously allocated blocks in a free list. In case of random allocation (size should be randomly taken from the range [1..64] in 64-bit case or [1..32] in 32-bit case) situation is the same: new blocks continue to appear if maximum possible allocation size (32 or 64) passed to the allocator, because all remaining blocks in a free list do not have enough free space to complete this allocation request. In summary if new blocks are put into the head of a free list eventually virtual space will be exhausted. In current patch I simply put newly allocated block to the tail of a free list, thus reduce fragmentation, giving a chance to resolve allocation request using older blocks with possible holes left. Signed-off-by: Roman Pen <r.peniaev@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Christoph Lameter <cl@linux.com> Cc: Gioh Kim <gioh.kim@lge.com> Cc: Rob Jones <rob.jones@codethink.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
thehajime
pushed a commit
that referenced
this issue
Apr 22, 2015
gcov profiling if enabled with other heavy compile-time instrumentation like KASan could trigger following softlockups: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1] Modules linked in: irq event stamp: 22823276 hardirqs last enabled at (22823275): [<ffffffff86e8d10d>] mutex_lock_nested+0x7d9/0x930 hardirqs last disabled at (22823276): [<ffffffff86e9521d>] apic_timer_interrupt+0x6d/0x80 softirqs last enabled at (22823172): [<ffffffff811ed969>] __do_softirq+0x4db/0x729 softirqs last disabled at (22823167): [<ffffffff811edfcf>] irq_exit+0x7d/0x15b CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 3.19.0-05245-gbb33326-dirty #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 task: ffff88006cba8000 ti: ffff88006cbb0000 task.ti: ffff88006cbb0000 RIP: kasan_mem_to_shadow+0x1e/0x1f Call Trace: strcmp+0x28/0x70 get_node_by_name+0x66/0x99 gcov_event+0x4f/0x69e gcov_enable_events+0x54/0x7b gcov_fs_init+0xf8/0x134 do_one_initcall+0x1b2/0x288 kernel_init_freeable+0x467/0x580 kernel_init+0x15/0x18b ret_from_fork+0x7c/0xb0 Kernel panic - not syncing: softlockup: hung tasks Fix this by sticking cond_resched() in gcov_enable_events(). Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
thehajime
pushed a commit
that referenced
this issue
May 10, 2015
This patch doesn't make any effect on previous behavior, since f2fs_write_data_page bypasses writing the page during POR. But, the difference is that this patch avoids holding writepages mutex. This is to avoid the following false warning, since this can happen only when mount and shutdown are triggered at the same time. ====================================================== [ INFO: possible circular locking dependency detected ] 4.0.0-rc1+ #3 Tainted: G O ------------------------------------------------------- kworker/u8:0/2270 is trying to acquire lock: (&sbi->gc_mutex){+.+.+.}, at: [<ffffffffa02bdd33>] f2fs_balance_fs+0x73/0x90 [f2fs] but task is already holding lock: (&sbi->writepages){+.+...}, at: [<ffffffffa02b261b>] f2fs_write_data_pages+0xcb/0x3a0 [f2fs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&sbi->writepages){+.+...}: [<ffffffff810e2b11>] lock_acquire+0xe1/0x2f0 [<ffffffff8185e1b3>] mutex_lock_nested+0x63/0x530 [<ffffffffa02b261b>] f2fs_write_data_pages+0xcb/0x3a0 [f2fs] [<ffffffff811c38c1>] do_writepages+0x21/0x50 [<ffffffff8126c5a6>] __writeback_single_inode+0x76/0xbf0 [<ffffffff8126e23a>] writeback_single_inode+0xea/0x1c0 [<ffffffff8126e425>] write_inode_now+0x95/0xa0 [<ffffffff81259dab>] iput+0x20b/0x3f0 [<ffffffffa02c1c8b>] recover_data.constprop.14+0x26b/0xa80 [f2fs] [<ffffffffa02c2776>] recover_fsync_data+0x2b6/0x5e0 [f2fs] [<ffffffffa02a9744>] f2fs_fill_super+0xb24/0xb90 [f2fs] [<ffffffff8123d7f4>] mount_bdev+0x1a4/0x1e0 [<ffffffffa02a3c85>] f2fs_mount+0x15/0x20 [f2fs] [<ffffffff8123e159>] mount_fs+0x39/0x180 [<ffffffff8125e51b>] vfs_kern_mount+0x6b/0x160 [<ffffffff81261554>] do_mount+0x204/0xbe0 [<ffffffff8126223b>] SyS_mount+0x8b/0xe0 [<ffffffff81863e6d>] system_call_fastpath+0x16/0x1b -> #1 (&sbi->cp_mutex){+.+...}: [<ffffffff810e2b11>] lock_acquire+0xe1/0x2f0 [<ffffffff8185e1b3>] mutex_lock_nested+0x63/0x530 [<ffffffffa02acbf2>] write_checkpoint+0x42/0x1230 [f2fs] [<ffffffffa02a847d>] f2fs_sync_fs+0x9d/0x2a0 [f2fs] [<ffffffff81272f82>] sync_filesystem+0x82/0xb0 [<ffffffff8123c214>] generic_shutdown_super+0x34/0x100 [<ffffffff8123c5f7>] kill_block_super+0x27/0x70 [<ffffffffa02a3c60>] kill_f2fs_super+0x20/0x30 [f2fs] [<ffffffff8123ca49>] deactivate_locked_super+0x49/0x80 [<ffffffff8123d05e>] deactivate_super+0x4e/0x70 [<ffffffff8125df63>] cleanup_mnt+0x43/0x90 [<ffffffff8125e002>] __cleanup_mnt+0x12/0x20 [<ffffffff810a82e4>] task_work_run+0xc4/0xf0 [<ffffffff8101f0bd>] do_notify_resume+0x8d/0xa0 [<ffffffff81864141>] int_signal+0x12/0x17 -> #0 (&sbi->gc_mutex){+.+.+.}: [<ffffffff810e2866>] __lock_acquire+0x1ac6/0x1c90 [<ffffffff810e2b11>] lock_acquire+0xe1/0x2f0 [<ffffffff8185e1b3>] mutex_lock_nested+0x63/0x530 [<ffffffffa02bdd33>] f2fs_balance_fs+0x73/0x90 [f2fs] [<ffffffffa02b5938>] f2fs_write_data_page+0x348/0x5b0 [f2fs] [<ffffffffa02af9da>] __f2fs_writepage+0x1a/0x50 [f2fs] [<ffffffff811c1b54>] write_cache_pages+0x274/0x6f0 [<ffffffffa02b2630>] f2fs_write_data_pages+0xe0/0x3a0 [f2fs] [<ffffffff811c38c1>] do_writepages+0x21/0x50 [<ffffffff8126c5a6>] __writeback_single_inode+0x76/0xbf0 [<ffffffff8126d44a>] writeback_sb_inodes+0x32a/0x710 [<ffffffff8126d8cf>] __writeback_inodes_wb+0x9f/0xd0 [<ffffffff8126dcdb>] wb_writeback+0x3db/0x850 [<ffffffff8126e848>] bdi_writeback_workfn+0x148/0x980 [<ffffffff810a3782>] process_one_work+0x1e2/0x840 [<ffffffff810a3f01>] worker_thread+0x121/0x460 [<ffffffff810a9dc8>] kthread+0xf8/0x110 [<ffffffff81863dbc>] ret_from_fork+0x7c/0xb0 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
thehajime
pushed a commit
that referenced
this issue
May 10, 2015
I think this is useful to verify whether a filter could be JITed or not in case of bpf_prog_enable >= 1, which otherwise the test suite doesn't tell besides taking a good peek at the performance numbers. Nicolas Schichan reported a bug in the ARM JIT compiler that rejected and waved the filter to the interpreter although it shouldn't have. Nevertheless, the test passes as expected, but such information is not visible. It's i.e. useful for the remaining classic JITs, but also for implementing remaining opcodes that are not yet present in eBPF JITs (e.g. ARM64 waves some of them to the interpreter). This minor patch allows to grep through dmesg to find those accordingly, but also provides a total summary, i.e.: [<X>/53 JIT'ed] # echo 1 > /proc/sys/net/core/bpf_jit_enable # insmod lib/test_bpf.ko # dmesg | grep "jited:0" dmesg example on the ARM issue with JIT rejection: [...] [ 67.925387] test_bpf: #2 ADD_SUB_MUL_K jited:1 24 PASS [ 67.930889] test_bpf: #3 DIV_MOD_KX jited:0 794 PASS [ 67.943940] test_bpf: #4 AND_OR_LSH_K jited:1 20 20 PASS [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Nicolas Schichan <nschichan@freebox.fr> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
thehajime
pushed a commit
that referenced
this issue
May 12, 2015
Andrew Lunn says: ==================== More Marvell DSA refactring and fixup This patch setup continues the refactoring and cleanup of the Marvell DSA drivers. Patch #1 Centralizes the duplicated parts of port setup and global setup into the shared mv88e6xxx. Patch #2 Centralizes looping over the ports setting them up Patch #3 Uses mnemonics for the remaining register access in the drivers. Patch #4 The 6172 is actually a member of the 6352 family. This moves the probe code into the correct driver. Patch #5 Adds more members of the 6171 family to the 6171 driver. The new devices are untested. Patch #6 The 6185 is a member of the 6131 family. Add it to the probe code of the 6131 driver. Patch #7 and Patch #8 Simply the mutex's in mv88e6xxx.c. The SMI bus is the bottleneck, not the granularity of the mutex's so simply the code down to a single mutex. Patch #8 Fixes a false positive lockdep splat, due to nested uses of MDIO busses. Patch #9 Fixes another false positive lockdep splat with the transmit queue because of stacked Ethernet devices. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
thehajime
pushed a commit
that referenced
this issue
May 12, 2015
Perf top raise a warning if a kernel sample is collected but kernel map is restricted. The warning message needs to dereference al.map->dso... However, previous perf_event__preprocess_sample() doesn't always guarantee al.map != NULL, for example, when kernel map is restricted. This patch validates al.map before dereferencing, avoid the segfault. Before this patch: $ cat /proc/sys/kernel/kptr_restrict 1 $ perf top -p 120183 perf: Segmentation fault -------- backtrace -------- /path/to/perf[0x509868] /lib64/libc.so.6(+0x3545f)[0x7f9a1540045f] /path/to/perf[0x448820] /path/to/perf(cmd_top+0xe3c)[0x44a5dc] /path/to/perf[0x4766a2] /path/to/perf(main+0x5f5)[0x42e545] /lib64/libc.so.6(__libc_start_main+0xf4)[0x7f9a153ecbd4] /path/to/perf[0x42e674] And gdb call trace: Program received signal SIGSEGV, Segmentation fault. perf_event__process_sample (machine=0xa44030, sample=0x7fffffffa4c0, evsel=0xa43b00, event=0x7ffff41c3000, tool=0x7fffffffa8a0) at builtin-top.c:736 736 !RB_EMPTY_ROOT(&al.map->dso->symbols[MAP__FUNCTION]) ? (gdb) bt #0 perf_event__process_sample (machine=0xa44030, sample=0x7fffffffa4c0, evsel=0xa43b00, event=0x7ffff41c3000, tool=0x7fffffffa8a0) at builtin-top.c:736 #1 perf_top__mmap_read_idx (top=top@entry=0x7fffffffa8a0, idx=idx@entry=0) at builtin-top.c:855 #2 0x000000000044a5dd in perf_top__mmap_read (top=0x7fffffffa8a0) at builtin-top.c:872 #3 __cmd_top (top=0x7fffffffa8a0) at builtin-top.c:997 #4 cmd_top (argc=<optimized out>, argv=<optimized out>, prefix=<optimized out>) at builtin-top.c:1267 #5 0x00000000004766a3 in run_builtin (p=p@entry=0x8a6ce8 <commands+264>, argc=argc@entry=3, argv=argv@entry=0x7fffffffdf70) at perf.c:371 #6 0x000000000042e546 in handle_internal_command (argv=0x7fffffffdf70, argc=3) at perf.c:430 #7 run_argv (argv=0x7fffffffdcf0, argcp=0x7fffffffdcfc) at perf.c:474 #8 main (argc=3, argv=0x7fffffffdf70) at perf.c:589 (gdb) Signed-off-by: Wang Nan <wangnan0@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/1429946703-80807-1-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
thehajime
pushed a commit
that referenced
this issue
May 14, 2015
Sowmini Varadhan says: ==================== net/rds: RDS-TCP robustness fixes This patch-set contains bug fixes for state-recovery at the RDS layer when the underlying transport is TCP and the TCP state at one of the endpoints is reset V2 changes: DaveM comments to reduce memory footprint, follow NFS/RPC model where possible. Added test-case #3 Without the changes in this set, when one of the endpoints is reset, the existing code does not correctly clean up RDS socket state for stale connections, resulting in some unstable, timing-dependant behavior on the wire, including an infinite exchange of 3WHs back-and-forth, and a resulting potential to never converge RDS state. Test cases used to verify the changes in this set are: 1. Start rds client/server applications on two participating nodes, node1 and node2. After at least one packet has been sent (to establish the TCP connection), restart the rds_tcp module on the client, and now resend packets. Tcpdump should show server sending a FIN for the "old" client port, and clean connection establishment/exchange for the new client port. 2. At the end of step 1, restart rds srever on node2, and start client on node1, make sure using tcpdump, 'netstat -an|grep 16385' that packets flow correctly. 3. start RDS client/server application on two participating nodes, and repeat steps 1 and 2, but this time, simulate node failure by doing "ifconfig <intf> down", so no FIN is sent. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
thehajime
pushed a commit
that referenced
this issue
May 26, 2015
There's a race between releasing extent buffers that are flagged as stale and recycling them that makes us it the following BUG_ON at btrfs_release_extent_buffer_page: BUG_ON(extent_buffer_under_io(eb)) The BUG_ON is triggered because the extent buffer has the flag EXTENT_BUFFER_DIRTY set as a consequence of having been reused and made dirty by another concurrent task. Here follows a sequence of steps that leads to the BUG_ON. CPU 0 CPU 1 CPU 2 path->nodes[0] == eb X X->refs == 2 (1 for the tree, 1 for the path) btrfs_header_generation(X) == current trans id flag EXTENT_BUFFER_DIRTY set on X btrfs_release_path(path) unlocks X reads eb X X->refs incremented to 3 locks eb X btrfs_del_items(X) X becomes empty clean_tree_block(X) clear EXTENT_BUFFER_DIRTY from X btrfs_del_leaf(X) unlocks X extent_buffer_get(X) X->refs incremented to 4 btrfs_free_tree_block(X) X's range is not pinned X's range added to free space cache free_extent_buffer_stale(X) lock X->refs_lock set EXTENT_BUFFER_STALE on X release_extent_buffer(X) X->refs decremented to 3 unlocks X->refs_lock btrfs_release_path() unlocks X free_extent_buffer(X) X->refs becomes 2 __btrfs_cow_block(Y) btrfs_alloc_tree_block() btrfs_reserve_extent() find_free_extent() gets offset == X->start btrfs_init_new_buffer(X->start) btrfs_find_create_tree_block(X->start) alloc_extent_buffer(X->start) find_extent_buffer(X->start) finds eb X in radix tree free_extent_buffer(X) lock X->refs_lock test X->refs == 2 test bit EXTENT_BUFFER_STALE is set test !extent_buffer_under_io(eb) increments X->refs to 3 mark_extent_buffer_accessed(X) check_buffer_tree_ref(X) --> does nothing, X->refs >= 2 and EXTENT_BUFFER_TREE_REF is set in X clear EXTENT_BUFFER_STALE from X locks X btrfs_mark_buffer_dirty() set_extent_buffer_dirty(X) check_buffer_tree_ref(X) --> does nothing, X->refs >= 2 and EXTENT_BUFFER_TREE_REF is set sets EXTENT_BUFFER_DIRTY on X test and clear EXTENT_BUFFER_TREE_REF decrements X->refs to 2 release_extent_buffer(X) decrements X->refs to 1 unlock X->refs_lock unlock X free_extent_buffer(X) lock X->refs_lock release_extent_buffer(X) decrements X->refs to 0 btrfs_release_extent_buffer_page(X) BUG_ON(extent_buffer_under_io(X)) --> EXTENT_BUFFER_DIRTY set on X Fix this by making find_extent buffer wait for any ongoing task currently executing free_extent_buffer()/free_extent_buffer_stale() if the extent buffer has the stale flag set. A more clean alternative would be to always increment the extent buffer's reference count while holding its refs_lock spinlock but find_extent_buffer is a performance critical area and that would cause lock contention whenever multiple tasks search for the same extent buffer concurrently. A build server running a SLES 12 kernel (3.12 kernel + over 450 upstream btrfs patches backported from newer kernels) was hitting this often: [1212302.461948] kernel BUG at ../fs/btrfs/extent_io.c:4507! (...) [1212302.470219] CPU: 1 PID: 19259 Comm: bs_sched Not tainted 3.12.36-38-default #1 [1212302.540792] Hardware name: Supermicro PDSM4/PDSM4, BIOS 6.00 04/17/2006 [1212302.540792] task: ffff8800e07e0100 ti: ffff8800d6412000 task.ti: ffff8800d6412000 [1212302.540792] RIP: 0010:[<ffffffffa0507081>] [<ffffffffa0507081>] btrfs_release_extent_buffer_page.constprop.51+0x101/0x110 [btrfs] (...) [1212302.630008] Call Trace: [1212302.630008] [<ffffffffa05070cd>] release_extent_buffer+0x3d/0xa0 [btrfs] [1212302.630008] [<ffffffffa04c2d9d>] btrfs_release_path+0x1d/0xa0 [btrfs] [1212302.630008] [<ffffffffa04c5c7e>] read_block_for_search.isra.33+0x13e/0x3a0 [btrfs] [1212302.630008] [<ffffffffa04c8094>] btrfs_search_slot+0x3f4/0xa80 [btrfs] [1212302.630008] [<ffffffffa04cf5d8>] lookup_inline_extent_backref+0xf8/0x630 [btrfs] [1212302.630008] [<ffffffffa04d13dd>] __btrfs_free_extent+0x11d/0xc40 [btrfs] [1212302.630008] [<ffffffffa04d64a4>] __btrfs_run_delayed_refs+0x394/0x11d0 [btrfs] [1212302.630008] [<ffffffffa04db379>] btrfs_run_delayed_refs.part.66+0x69/0x280 [btrfs] [1212302.630008] [<ffffffffa04ed2ad>] __btrfs_end_transaction+0x2ad/0x3d0 [btrfs] [1212302.630008] [<ffffffffa04f7505>] btrfs_evict_inode+0x4a5/0x500 [btrfs] [1212302.630008] [<ffffffff811b9e28>] evict+0xa8/0x190 [1212302.630008] [<ffffffff811b0330>] do_unlinkat+0x1a0/0x2b0 I was also able to reproduce this on a 3.19 kernel, corresponding to Chris' integration branch from about a month ago, running the following stress test on a qemu/kvm guest (with 4 virtual cpus and 16Gb of ram): while true; do mkfs.btrfs -l 4096 -f -b `expr 20 \* 1024 \* 1024 \* 1024` /dev/sdd mount /dev/sdd /mnt snapshot_cmd="btrfs subvolume snapshot -r /mnt" snapshot_cmd="$snapshot_cmd /mnt/snap_\`date +'%H_%M_%S_%N'\`" fsstress -d /mnt -n 25000 -p 8 -x "$snapshot_cmd" -X 100 umount /mnt done Which usually triggers the BUG_ON within less than 24 hours: [49558.618097] ------------[ cut here ]------------ [49558.619732] kernel BUG at fs/btrfs/extent_io.c:4551! (...) [49558.620031] CPU: 3 PID: 23908 Comm: fsstress Tainted: G W 3.19.0-btrfs-next-7+ #3 [49558.620031] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [49558.620031] task: ffff8800319fc0d0 ti: ffff880220da8000 task.ti: ffff880220da8000 [49558.620031] RIP: 0010:[<ffffffffa0476b1a>] [<ffffffffa0476b1a>] btrfs_release_extent_buffer_page+0x20/0xe9 [btrfs] (...) [49558.620031] Call Trace: [49558.620031] [<ffffffffa0476c73>] release_extent_buffer+0x90/0xd3 [btrfs] [49558.620031] [<ffffffff8142b10c>] ? _raw_spin_lock+0x3b/0x43 [49558.620031] [<ffffffffa0477052>] ? free_extent_buffer+0x37/0x94 [btrfs] [49558.620031] [<ffffffffa04770ab>] free_extent_buffer+0x90/0x94 [btrfs] [49558.620031] [<ffffffffa04396d5>] btrfs_release_path+0x4a/0x69 [btrfs] [49558.620031] [<ffffffffa0444907>] __btrfs_free_extent+0x778/0x80c [btrfs] [49558.620031] [<ffffffffa044a485>] __btrfs_run_delayed_refs+0xad2/0xc62 [btrfs] [49558.728054] [<ffffffff811420d5>] ? kmemleak_alloc_recursive.constprop.52+0x16/0x18 [49558.728054] [<ffffffffa044c1e8>] btrfs_run_delayed_refs+0x6d/0x1ba [btrfs] [49558.728054] [<ffffffffa045917f>] ? join_transaction.isra.9+0xb9/0x36b [btrfs] [49558.728054] [<ffffffffa045a75c>] btrfs_commit_transaction+0x4c/0x981 [btrfs] [49558.728054] [<ffffffffa0434f86>] btrfs_sync_fs+0xd5/0x10d [btrfs] [49558.728054] [<ffffffff81155923>] ? iterate_supers+0x60/0xc4 [49558.728054] [<ffffffff8117966a>] ? do_sync_work+0x91/0x91 [49558.728054] [<ffffffff8117968a>] sync_fs_one_sb+0x20/0x22 [49558.728054] [<ffffffff81155939>] iterate_supers+0x76/0xc4 [49558.728054] [<ffffffff811798e8>] sys_sync+0x55/0x83 [49558.728054] [<ffffffff8142bbd2>] system_call_fastpath+0x12/0x17 Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
thehajime
pushed a commit
that referenced
this issue
May 26, 2015
mixer_regs_dump() was called in mixer_run(), which was called under the register spinlock in mixer_graph_buffer() and vp_video_buffer(). This would trigger a sysmmu pagefault with drm.debug=0xff because of the large delay caused by the register dumping. To keep consistency also move register dumping out of mixer_stop(), which is the counterpart to mixer_run(). Kernel dump: [ 131.296529] [drm:mixer_win_commit] win: 2 [ 131.300693] [drm:mixer_regs_dump] MXR_STATUS = 00000081 [ 131.305888] [drm:mixer_regs_dump] MXR_CFG = 000007d5 [ 131.310835] [drm:mixer_regs_dump] MXR_INT_EN = 00000000 [ 131.316043] [drm:mixer_regs_dump] MXR_INT_STATUS = 00000900 [ 131.321598] [drm:mixer_regs_dump] MXR_LAYER_CFG = 00000321 [ 131.327066] [drm:mixer_regs_dump] MXR_VIDEO_CFG = 00000000 [ 131.332535] [drm:mixer_regs_dump] MXR_GRAPHIC0_CFG = 00310700 [ 131.338263] [drm:mixer_regs_dump] MXR_GRAPHIC0_BASE = 20c00000 [ 131.344079] [drm:mixer_regs_dump] MXR_GRAPHIC0_SPAN = 00000780 [ 131.349895] [drm:mixer_regs_dump] MXR_GRAPHIC0_WH = 07800438 [ 131.355537] [drm:mixer_regs_dump] MXR_GRAPHIC0_SXY = 00000000 [ 131.361265] [drm:mixer_regs_dump] MXR_GRAPHIC0_DXY = 00000000 [ 131.366994] [drm:mixer_regs_dump] MXR_GRAPHIC1_CFG = 00000000 [ 131.372723] [drm:mixer_regs_dump] MXR_GRAPHIC1_BASE = 00000000 [ 131.378539] [drm:mixer_regs_dump] MXR_GRAPHIC1_SPAN = 00000000 [ 131.384354] [drm:mixer_regs_dump] MXR_GRAPHIC1_WH = 00000000 [ 131.389996] [drm:mixer_regs_dump] MXR_GRAPHIC1_SXY = 00000000 [ 131.395725] [drm:mixer_regs_dump] MXR_GRAPHIC1_DXY = 00000000 [ 131.401486] PAGE FAULT occurred at 0x0 by 12e20000.sysmmu(Page table base: 0x6d990000) [ 131.409353] Lv1 entry: 0x6e0f2401 [ 131.412753] ------------[ cut here ]------------ [ 131.417339] kernel BUG at drivers/iommu/exynos-iommu.c:358! [ 131.422894] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM [ 131.428709] Modules linked in: ecb bridge stp llc bnep btrfs xor xor_neon zlib_inflate zlib_deflate raid6_pq btusb bluetooth usb_storage s5p_jpeg videobuf2_dma_contig videobuf2_memops v4l2_mem2mem videobuf2_core [ 131.447461] CPU: 0 PID: 2418 Comm: lt-modetest Tainted: G W 4.0.1-debug+ #3 [ 131.455530] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [ 131.461607] task: ee194100 ti: ec4fe000 task.ti: ec4fe000 [ 131.466995] PC is at exynos_sysmmu_irq+0x2a0/0x2a8 [ 131.471766] LR is at vprintk_emit+0x268/0x594 [ 131.476103] pc : [<c02781a4>] lr : [<c00650d0>] psr: a00001d3 [ 131.476103] sp : ec4ff9d8 ip : 00000000 fp : ec4ffa14 [ 131.487559] r10: ffffffda r9 : ee206e28 r8 : ee2d1a10 [ 131.492767] r7 : 00000000 r6 : 00000000 r5 : 00000000 r4 : ee206e10 [ 131.499277] r3 : c06fca20 r2 : 00000000 r1 : 00000000 r0 : ee28be00 [ 131.505788] Flags: NzCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment user [ 131.513079] Control: 10c5387d Table: 6c72404a DAC: 00000015 [ 131.518808] Process lt-modetest (pid: 2418, stack limit = 0xec4fe218) [ 131.525231] Stack: (0xec4ff9d8 to 0xec500000) [ 131.529571] f9c0: ec4ff9e4 c03a0c40 [ 131.537732] f9e0: bbfa6e35 6d990000 6d161c3d ee20a900 ee04a7e0 00000028 ee007000 00000000 [ 131.545891] fa00: 00000000 c06fb1fc ec4ffa5c ec4ffa18 c0066a34 c0277f10 ee257664 0000000b [ 131.554050] fa20: ec4ffa5c c06fafbb ee04a780 c06fb1e8 00000000 ee04a780 ee04a7e0 ee20a900 [ 131.562209] fa40: ee007000 00000015 ec4ffb48 ee008000 ec4ffa7c ec4ffa60 c0066c90 c00669e0 [ 131.570369] fa60: 00020000 ee04a780 ee04a7e0 00001000 ec4ffa94 ec4ffa80 c0069c6c c0066c58 [ 131.578528] fa80: 00000028 ee004450 ec4ffaac ec4ffa98 c0066028 c0069bac 000000a0 c06e19b4 [ 131.586687] faa0: ec4ffad4 ec4ffab0 c0223678 c0066000 c02235dc 00000015 00000000 00000015 [ 131.594846] fac0: ec4ffc80 00000001 ec4ffaec ec4ffad8 c0066028 c02235e8 00000089 c06bfc54 [ 131.603005] fae0: ec4ffb1c ec4ffaf0 c006633c c0066000 ec4ffb48 f002000c 00000025 00000015 [ 131.611165] fb00: c06c680c ec4ffb48 f0020000 ee008000 ec4ffb44 ec4ffb20 c000867c c00662c4 [ 131.619324] fb20: c02046ac 60000153 ffffffff ec4ffb7c 00000000 00000101 ec4ffbb4 ec4ffb48 [ 131.627483] fb40: c0013240 c0008650 00000001 ee257508 00000002 00000001 ee257504 ee257508 [ 131.635642] fb60: 00000000 c06bf27c 00000000 00000101 ee008000 ec4ffbb4 00000000 ec4ffb90 [ 131.643802] fb80: c002e124 c02046ac 60000153 ffffffff c002e09c 00000000 c06c6080 00000283 [ 131.651960] fba0: 00000001 c06fb1ac ec4ffc0c ec4ffbb8 c002d690 c002e0a8 ee78d080 ee008000 [ 131.660120] fbc0: 00400000 c04eb3b0 ffff7c44 c06c6100 c06fdac0 0000000a c06bf2f0 c06c6080 [ 131.668279] fbe0: c06bfc54 c06bfc54 00000000 00000025 00000000 00000001 ec4ffc80 ee008000 [ 131.676438] fc00: ec4ffc24 ec4ffc10 c002dbb8 c002d564 00000089 c06bfc54 ec4ffc54 ec4ffc28 [ 131.684597] fc20: c0066340 c002dafc ec4ffc80 f002000c 0000001c 0000000c c06c680c ec4ffc80 [ 131.692757] fc40: f0020000 00000080 ec4ffc7c ec4ffc58 c000867c c00662c4 c04e6624 60000053 [ 131.700916] fc60: ffffffff ec4ffcb4 c072df54 ee22d010 ec4ffcdc ec4ffc80 c0013240 c0008650 [ 131.709075] fc80: ee22d664 ee194100 00000000 ec4fe000 60000053 00000400 00000002 ee22d420 [ 131.717234] fca0: c072df54 ee22d010 00000080 ec4ffcdc ec4ffcc8 ec4ffcc8 c04e6620 c04e6624 [ 131.725393] fcc0: 60000053 ffffffff ec4fe000 c072df54 ec4ffd34 ec4ffce0 c02b64d0 c04e6618 [ 131.733552] fce0: ec4ffcf8 00000000 00000000 60000053 00010000 00010000 00000000 200cb000 [ 131.741712] fd00: 20080000 ee22d664 00000001 ee256000 ee261400 ee22d420 00000080 00000080 [ 131.749871] fd20: ee256000 00000280 ec4ffd74 ec4ffd38 c02a8844 c02b5fec 00000080 00000280 [ 131.758030] fd40: 000001e0 00000000 00000000 00000280 000001e0 ee22d220 01e00000 00000002 [ 131.766189] fd60: ee22d420 ee261400 ec4ffdbc ec4ffd78 c0293cbc c02a87a4 00000080 00000280 [ 131.774348] fd80: 000001e0 00000000 00000000 02800000 01e00000 ee261400 ee22d460 ee261400 [ 131.782508] fda0: ee22d420 00000000 01e00000 000001e0 ec4ffe24 ec4ffdc0 c0297800 c0293b24 [ 131.790667] fdc0: 00000080 00000280 000001e0 00000000 00000000 02800000 01e00000 ec4ffdf8 [ 131.798826] fde0: c028db00 00000080 00000080 ee256000 02800000 00000000 ec4ffe24 c06c6448 [ 131.806985] fe00: c072df54 000000b7 ee013800 ec4ffe54 edbf7300 ec4ffe54 ec4fff04 ec4ffe28 [ 131.815145] fe20: c028a848 c029768c 00000001 c06195d8 ec4ffe5c ec4ffe40 c0297680 c0521f6c [ 131.823304] fe40: 00000030 bed45d38 00000030 c03064b7 ec4ffe8c 00000011 00000015 00000022 [ 131.831463] fe60: 00000000 00000080 00000080 00000280 000001e0 00000000 00000000 01e00000 [ 131.839622] fe80: 02800000 00000000 00000000 0004b000 00000000 00000000 c00121e4 c0011080 [ 131.847781] fea0: c00110a4 00000000 00000000 00000000 ec4ffeec ec4ffec0 c00110f0 c00121cc [ 131.855940] fec0: 00000000 c00e7fec ec4ffeec ec4ffed8 c004af2c dc8ba201 edae4fc0 edbf7000 [ 131.864100] fee0: edbf7000 00000003 bed45d38 00000003 bed45d38 ee3f2040 ec4fff7c ec4fff08 [ 131.872259] ff00: c010b62c c028a684 edae4fc0 00000000 00000000 b6666000 ec40d108 edae4fc4 [ 131.880418] ff20: ec4fff6c ec4fff30 c00e7fec c02207b0 000001f9 00000000 edae5008 ec40d110 [ 131.888577] ff40: 00070800 edae5008 edae4fc0 00070800 b6666000 edbf7000 edbf7000 c03064b7 [ 131.896736] ff60: bed45d38 00000003 ec4fe000 00000000 ec4fffa4 ec4fff80 c010b84c c010b208 [ 131.904896] ff80: 00000022 00000000 bed45d38 c03064b7 00000036 c000ede4 00000000 ec4fffa8 [ 131.913055] ffa0: c000ec40 c010b81c 00000000 bed45d38 00000003 c03064b7 bed45d38 00000022 [ 131.921214] ffc0: 00000000 bed45d38 c03064b7 00000036 00000080 00000080 00000000 000001e0 [ 131.929373] ffe0: b6da4064 bed45d1c b6d98968 b6e8082c 60000050 00000003 00000000 00000000 [ 131.937529] Backtrace: [ 131.939967] [<c0277f04>] (exynos_sysmmu_irq) from [<c0066a34>] (handle_irq_event_percpu+0x60/0x278) [ 131.948988] r10:c06fb1fc r9:00000000 r8:00000000 r7:ee007000 r6:00000028 r5:ee04a7e0 [ 131.956799] r4:ee20a900 [ 131.959320] [<c00669d4>] (handle_irq_event_percpu) from [<c0066c90>] (handle_irq_event+0x44/0x64) [ 131.968170] r10:ee008000 r9:ec4ffb48 r8:00000015 r7:ee007000 r6:ee20a900 r5:ee04a7e0 [ 131.975982] r4:ee04a780 [ 131.978504] [<c0066c4c>] (handle_irq_event) from [<c0069c6c>] (handle_level_irq+0xcc/0x144) [ 131.986832] r6:00001000 r5:ee04a7e0 r4:ee04a780 r3:00020000 [ 131.992478] [<c0069ba0>] (handle_level_irq) from [<c0066028>] (generic_handle_irq+0x34/0x44) [ 132.000894] r5:ee004450 r4:00000028 [ 132.004459] [<c0065ff4>] (generic_handle_irq) from [<c0223678>] (combiner_handle_cascade_irq+0x9c/0x108) [ 132.013914] r4:c06e19b4 r3:000000a0 [ 132.017476] [<c02235dc>] (combiner_handle_cascade_irq) from [<c0066028>] (generic_handle_irq+0x34/0x44) [ 132.026847] r8:00000001 r7:ec4ffc80 r6:00000015 r5:00000000 r4:00000015 r3:c02235dc [ 132.034576] [<c0065ff4>] (generic_handle_irq) from [<c006633c>] (__handle_domain_irq+0x84/0xf0) [ 132.043252] r4:c06bfc54 r3:00000089 [ 132.046815] [<c00662b8>] (__handle_domain_irq) from [<c000867c>] (gic_handle_irq+0x38/0x70) [ 132.055144] r10:ee008000 r9:f0020000 r8:ec4ffb48 r7:c06c680c r6:00000015 r5:00000025 [ 132.062956] r4:f002000c r3:ec4ffb48 [ 132.066520] [<c0008644>] (gic_handle_irq) from [<c0013240>] (__irq_svc+0x40/0x74) [ 132.073980] Exception stack(0xec4ffb48 to 0xec4ffb90) [ 132.079016] fb40: 00000001 ee257508 00000002 00000001 ee257504 ee257508 [ 132.087176] fb60: 00000000 c06bf27c 00000000 00000101 ee008000 ec4ffbb4 00000000 ec4ffb90 [ 132.095333] fb80: c002e124 c02046ac 60000153 ffffffff [ 132.100367] r9:00000101 r8:00000000 r7:ec4ffb7c r6:ffffffff r5:60000153 r4:c02046ac [ 132.108098] [<c002e09c>] (tasklet_hi_action) from [<c002d690>] (__do_softirq+0x138/0x38c) [ 132.116251] r8:c06fb1ac r7:00000001 r6:00000283 r5:c06c6080 r4:00000000 r3:c002e09c [ 132.123980] [<c002d558>] (__do_softirq) from [<c002dbb8>] (irq_exit+0xc8/0x104) [ 132.131268] r10:ee008000 r9:ec4ffc80 r8:00000001 r7:00000000 r6:00000025 r5:00000000 [ 132.139080] r4:c06bfc54 [ 132.141600] [<c002daf0>] (irq_exit) from [<c0066340>] (__handle_domain_irq+0x88/0xf0) [ 132.149409] r4:c06bfc54 r3:00000089 [ 132.152971] [<c00662b8>] (__handle_domain_irq) from [<c000867c>] (gic_handle_irq+0x38/0x70) [ 132.161300] r10:00000080 r9:f0020000 r8:ec4ffc80 r7:c06c680c r6:0000000c r5:0000001c [ 132.169112] r4:f002000c r3:ec4ffc80 [ 132.172675] [<c0008644>] (gic_handle_irq) from [<c0013240>] (__irq_svc+0x40/0x74) [ 132.180137] Exception stack(0xec4ffc80 to 0xec4ffcc8) [ 132.185173] fc80: ee22d664 ee194100 00000000 ec4fe000 60000053 00000400 00000002 ee22d420 [ 132.193332] fca0: c072df54 ee22d010 00000080 ec4ffcdc ec4ffcc8 ec4ffcc8 c04e6620 c04e6624 [ 132.201489] fcc0: 60000053 ffffffff [ 132.204961] r9:ee22d010 r8:c072df54 r7:ec4ffcb4 r6:ffffffff r5:60000053 r4:c04e6624 [ 132.212694] [<c04e660c>] (_raw_spin_unlock_irqrestore) from [<c02b64d0>] (mixer_win_commit+0x4f0/0xcc8) [ 132.222060] r4:c072df54 r3:ec4fe000 [ 132.225625] [<c02b5fe0>] (mixer_win_commit) from [<c02a8844>] (exynos_update_plane+0xac/0xb8) [ 132.234126] r10:00000280 r9:ee256000 r8:00000080 r7:00000080 r6:ee22d420 r5:ee261400 [ 132.241937] r4:ee256000 [ 132.244461] [<c02a8798>] (exynos_update_plane) from [<c0293cbc>] (__setplane_internal+0x1a4/0x2c0) [ 132.253395] r7:ee261400 r6:ee22d420 r5:00000002 r4:01e00000 [ 132.259041] [<c0293b18>] (__setplane_internal) from [<c0297800>] (drm_mode_setplane+0x180/0x244) [ 132.267804] r9:000001e0 r8:01e00000 r7:00000000 r6:ee22d420 r5:ee261400 r4:ee22d460 [ 132.275535] [<c0297680>] (drm_mode_setplane) from [<c028a848>] (drm_ioctl+0x1d0/0x58c) [ 132.283428] r10:ec4ffe54 r9:edbf7300 r8:ec4ffe54 r7:ee013800 r6:000000b7 r5:c072df54 [ 132.291240] r4:c06c6448 [ 132.293763] [<c028a678>] (drm_ioctl) from [<c010b62c>] (do_vfs_ioctl+0x430/0x614) [ 132.301222] r10:ee3f2040 r9:bed45d38 r8:00000003 r7:bed45d38 r6:00000003 r5:edbf7000 [ 132.309034] r4:edbf7000 [ 132.311555] [<c010b1fc>] (do_vfs_ioctl) from [<c010b84c>] (SyS_ioctl+0x3c/0x64) [ 132.318842] r10:00000000 r9:ec4fe000 r8:00000003 r7:bed45d38 r6:c03064b7 r5:edbf7000 [ 132.326654] r4:edbf7000 [ 132.329176] [<c010b810>] (SyS_ioctl) from [<c000ec40>] (ret_fast_syscall+0x0/0x34) [ 132.336723] r8:c000ede4 r7:00000036 r6:c03064b7 r5:bed45d38 r4:00000000 r3:00000022 [ 132.344451] Code: e3130002 0affffaf eb09a67d eaffffad (e7f001f2) [ 132.350528] ---[ end trace d428689b94df895c ]--- [ 132.355126] Kernel panic - not syncing: Fatal exception in interrupt [ 132.361465] CPU2: stopping [ 132.364155] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D W 4.0.1-debug+ #3 [ 132.371791] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [ 132.377866] Backtrace: [ 132.380304] [<c0012484>] (dump_backtrace) from [<c001269c>] (show_stack+0x18/0x1c) [ 132.387849] r6:c06e158c r5:ffffffff r4:00000000 r3:dc8ba201 [ 132.393497] [<c0012684>] (show_stack) from [<c04dfb94>] (dump_stack+0x88/0xc8) [ 132.400698] [<c04dfb0c>] (dump_stack) from [<c0014894>] (handle_IPI+0x1c8/0x2c4) [ 132.408073] r6:c06bfc54 r5:c06bfc54 r4:00000005 r3:ee0b0000 [ 132.413718] [<c00146cc>] (handle_IPI) from [<c00086b0>] (gic_handle_irq+0x6c/0x70) [ 132.421267] r9:f0028000 r8:ee0b1f48 r7:c06c680c r6:fffffff5 r5:00000005 r4:f002800c [ 132.428995] [<c0008644>] (gic_handle_irq) from [<c0013240>] (__irq_svc+0x40/0x74) [ 132.436457] Exception stack(0xee0b1f48 to 0xee0b1f90) [ 132.441493] 1f40: 00000001 00000000 00000000 c00206c0 c06c6518 c04eb3a4 [ 132.449653] 1f60: 00000000 00000000 c06c0dc0 00000001 c06fb774 ee0b1f9c ee0b1fa0 ee0b1f90 [ 132.457811] 1f80: c000f82c c000f830 600f0053 ffffffff [ 132.462844] r9:00000001 r8:c06c0dc0 r7:ee0b1f7c r6:ffffffff r5:600f0053 r4:c000f830 [ 132.470575] [<c000f7f0>] (arch_cpu_idle) from [<c005b6e8>] (cpu_startup_entry+0x318/0x4ec) [ 132.478818] [<c005b3d0>] (cpu_startup_entry) from [<c00144d0>] (secondary_start_kernel+0xf4/0x100) [ 132.487755] r7:c06fd440 [ 132.490279] [<c00143dc>] (secondary_start_kernel) from [<40008744>] (0x40008744) [ 132.497651] r4:6e09006a r3:c000872c [ 132.501210] CPU3: stopping [ 132.503904] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G D W 4.0.1-debug+ #3 [ 132.511539] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [ 132.517614] Backtrace: [ 132.520051] [<c0012484>] (dump_backtrace) from [<c001269c>] (show_stack+0x18/0x1c) [ 132.527597] r6:c06e158c r5:ffffffff r4:00000000 r3:dc8ba201 [ 132.533243] [<c0012684>] (show_stack) from [<c04dfb94>] (dump_stack+0x88/0xc8) [ 132.540446] [<c04dfb0c>] (dump_stack) from [<c0014894>] (handle_IPI+0x1c8/0x2c4) [ 132.547820] r6:c06bfc54 r5:c06bfc54 r4:00000005 r3:ee0b2000 [ 132.553466] [<c00146cc>] (handle_IPI) from [<c00086b0>] (gic_handle_irq+0x6c/0x70) [ 132.561014] r9:f002c000 r8:ee0b3f48 r7:c06c680c r6:fffffff5 r5:00000005 r4:f002c00c [ 132.568743] [<c0008644>] (gic_handle_irq) from [<c0013240>] (__irq_svc+0x40/0x74) [ 132.576205] Exception stack(0xee0b3f48 to 0xee0b3f90) [ 132.581241] 3f40: 00000001 00000000 00000000 c00206c0 c06c6518 c04eb3a4 [ 132.589401] 3f60: 00000000 00000000 c06c0dc0 00000001 c06fb774 ee0b3f9c ee0b3fa0 ee0b3f90 [ 132.597558] 3f80: c000f82c c000f830 600f0053 ffffffff [ 132.602591] r9:00000001 r8:c06c0dc0 r7:ee0b3f7c r6:ffffffff r5:600f0053 r4:c000f830 [ 132.610321] [<c000f7f0>] (arch_cpu_idle) from [<c005b6e8>] (cpu_startup_entry+0x318/0x4ec) [ 132.618566] [<c005b3d0>] (cpu_startup_entry) from [<c00144d0>] (secondary_start_kernel+0xf4/0x100) [ 132.627503] r7:c06fd440 [ 132.630023] [<c00143dc>] (secondary_start_kernel) from [<40008744>] (0x40008744) [ 132.637399] r4:6e09006a r3:c000872c [ 132.640958] CPU1: stopping [ 132.643651] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G D W 4.0.1-debug+ #3 [ 132.651287] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [ 132.657362] Backtrace: [ 132.659799] [<c0012484>] (dump_backtrace) from [<c001269c>] (show_stack+0x18/0x1c) [ 132.667344] r6:c06e158c r5:ffffffff r4:00000000 r3:dc8ba201 [ 132.672991] [<c0012684>] (show_stack) from [<c04dfb94>] (dump_stack+0x88/0xc8) [ 132.680194] [<c04dfb0c>] (dump_stack) from [<c0014894>] (handle_IPI+0x1c8/0x2c4) [ 132.687569] r6:c06bfc54 r5:c06bfc54 r4:00000005 r3:ee0ae000 [ 132.693214] [<c00146cc>] (handle_IPI) from [<c00086b0>] (gic_handle_irq+0x6c/0x70) [ 132.700762] r9:f0024000 r8:ee0aff48 r7:c06c680c r6:fffffff5 r5:00000005 r4:f002400c [ 132.708491] [<c0008644>] (gic_handle_irq) from [<c0013240>] (__irq_svc+0x40/0x74) [ 132.715953] Exception stack(0xee0aff48 to 0xee0aff90) [ 132.720989] ff40: 00000001 00000000 00000000 c00206c0 c06c6518 c04eb3a4 [ 132.729149] ff60: 00000000 00000000 c06c0dc0 00000001 c06fb774 ee0aff9c ee0affa0 ee0aff90 [ 132.737306] ff80: c000f82c c000f830 60070053 ffffffff [ 132.742339] r9:00000001 r8:c06c0dc0 r7:ee0aff7c r6:ffffffff r5:60070053 r4:c000f830 [ 132.750069] [<c000f7f0>] (arch_cpu_idle) from [<c005b6e8>] (cpu_startup_entry+0x318/0x4ec) [ 132.758314] [<c005b3d0>] (cpu_startup_entry) from [<c00144d0>] (secondary_start_kernel+0xf4/0x100) [ 132.767251] r7:c06fd440 [ 132.769772] [<c00143dc>] (secondary_start_kernel) from [<40008744>] (0x40008744) [ 132.777146] r4:6e09006a r3:c000872c [ 132.780709] ---[ end Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Inki Dae <inki.dae@samsung.com>
thehajime
pushed a commit
that referenced
this issue
Jul 14, 2015
…o next/late Merge "ARM: mvebu: dt changes for v4.2" from Gregory Clement mvebu dt changes for v4.2 (part #3) Update Armada XP DT spi muxing after pinctrl function rename which was merged in the pinctrl subsystem for 4.2. Without it the spi muxing will be broken in 4.2-rc1 for Armada XP. * tag 'mvebu-dt-4.2-3' of git://git.infradead.org/linux-mvebu: ARM: mvebu: adjust Armada XP DT spi muxing after pinctrl function rename
thehajime
pushed a commit
that referenced
this issue
Aug 14, 2015
Jiri Pirko says: ==================== Introduce Mellanox Technologies Switch ASICs switchdev drivers This patchset introduces Mellanox Technologies Switch driver infrastructure and support for SwitchX-2 ASIC. The driver is divided into 3 logical parts: 1) Bus - implements switch bus interface. Currently only PCI bus is implemented, but more buses will be added in the future. Namely I2C and SGMII. (patch #2) 2) Driver - implemements of ASIC-specific functions. Currently SwitchX-2 ASIC is supported, but a plan exists to introduce support for Spectrum ASIC in the near future. (patch #4) 3) Core - infrastructure that glues buses and drivers together. It implements register access logic (EMADs) and takes care of RX traps and events. (patch #1 and #3) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
thehajime
pushed a commit
that referenced
this issue
Aug 14, 2015
Nikolay has reported a hang when a memcg reclaim got stuck with the following backtrace: PID: 18308 TASK: ffff883d7c9b0a30 CPU: 1 COMMAND: "rsync" #0 __schedule at ffffffff815ab152 #1 schedule at ffffffff815ab76e #2 schedule_timeout at ffffffff815ae5e5 #3 io_schedule_timeout at ffffffff815aad6a #4 bit_wait_io at ffffffff815abfc6 #5 __wait_on_bit at ffffffff815abda5 #6 wait_on_page_bit at ffffffff8111fd4f #7 shrink_page_list at ffffffff81135445 #8 shrink_inactive_list at ffffffff81135845 #9 shrink_lruvec at ffffffff81135ead #10 shrink_zone at ffffffff811360c3 #11 shrink_zones at ffffffff81136eff #12 do_try_to_free_pages at ffffffff8113712f #13 try_to_free_mem_cgroup_pages at ffffffff811372be #14 try_charge at ffffffff81189423 #15 mem_cgroup_try_charge at ffffffff8118c6f5 #16 __add_to_page_cache_locked at ffffffff8112137d #17 add_to_page_cache_lru at ffffffff81121618 #18 pagecache_get_page at ffffffff8112170b #19 grow_dev_page at ffffffff811c8297 #20 __getblk_slow at ffffffff811c91d6 #21 __getblk_gfp at ffffffff811c92c1 #22 ext4_ext_grow_indepth at ffffffff8124565c #23 ext4_ext_create_new_leaf at ffffffff81246ca8 #24 ext4_ext_insert_extent at ffffffff81246f09 #25 ext4_ext_map_blocks at ffffffff8124a848 #26 ext4_map_blocks at ffffffff8121a5b7 #27 mpage_map_one_extent at ffffffff8121b1fa #28 mpage_map_and_submit_extent at ffffffff8121f07b #29 ext4_writepages at ffffffff8121f6d5 #30 do_writepages at ffffffff8112c490 #31 __filemap_fdatawrite_range at ffffffff81120199 #32 filemap_flush at ffffffff8112041c #33 ext4_alloc_da_blocks at ffffffff81219da1 #34 ext4_rename at ffffffff81229b91 #35 ext4_rename2 at ffffffff81229e32 #36 vfs_rename at ffffffff811a08a5 #37 SYSC_renameat2 at ffffffff811a3ffc #38 sys_renameat2 at ffffffff811a408e #39 sys_rename at ffffffff8119e51e #40 system_call_fastpath at ffffffff815afa89 Dave Chinner has properly pointed out that this is a deadlock in the reclaim code because ext4 doesn't submit pages which are marked by PG_writeback right away. The heuristic was introduced by commit e62e384 ("memcg: prevent OOM with too many dirty pages") and it was applied only when may_enter_fs was specified. The code has been changed by c3b94f4 ("memcg: further prevent OOM with too many dirty pages") which has removed the __GFP_FS restriction with a reasoning that we do not get into the fs code. But this is not sufficient apparently because the fs doesn't necessarily submit pages marked PG_writeback for IO right away. ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily submit the bio. Instead it tries to map more pages into the bio and mpage_map_one_extent might trigger memcg charge which might end up waiting on a page which is marked PG_writeback but hasn't been submitted yet so we would end up waiting for something that never finishes. Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2) before we go to wait on the writeback. The page fault path, which is the only path that triggers memcg oom killer since 3.12, shouldn't require GFP_NOFS and so we shouldn't reintroduce the premature OOM killer issue which was originally addressed by the heuristic. As per David Chinner the xfs is doing similar thing since 2.6.15 already so ext4 is not the only affected filesystem. Moreover he notes: : For example: IO completion might require unwritten extent conversion : which executes filesystem transactions and GFP_NOFS allocations. The : writeback flag on the pages can not be cleared until unwritten : extent conversion completes. Hence memory reclaim cannot wait on : page writeback to complete in GFP_NOFS context because it is not : safe to do so, memcg reclaim or otherwise. Cc: stable@vger.kernel.org # 3.9+ [tytso@mit.edu: corrected the control flow] Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages") Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
thehajime
pushed a commit
that referenced
this issue
Aug 14, 2015
Commit 0e1cc95 ("mm: meminit: finish initialisation of struct pages before basic setup") introduced a rwsem to signal completion of the initialization workers. Lockdep complains about possible recursive locking: ============================================= [ INFO: possible recursive locking detected ] 4.1.0-12802-g1dc51b8 #3 Not tainted --------------------------------------------- swapper/0/1 is trying to acquire lock: (pgdat_init_rwsem){++++.+}, at: [<ffffffff8424c7fb>] page_alloc_init_late+0xc7/0xe6 but task is already holding lock: (pgdat_init_rwsem){++++.+}, at: [<ffffffff8424c772>] page_alloc_init_late+0x3e/0xe6 Replace the rwsem by a completion together with an atomic "outstanding work counter". [peterz@infradead.org: Barrier removal on the grounds of being pointless] [mgorman@suse.de: Applied review feedback] Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Alex Ng <alexng@microsoft.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
thehajime
pushed a commit
that referenced
this issue
Aug 14, 2015
The shm implementation internally uses shmem or hugetlbfs inodes for shm segments. As these inodes are never directly exposed to userspace and only accessed through the shm operations which are already hooked by security modules, mark the inodes with the S_PRIVATE flag so that inode security initialization and permission checking is skipped. This was motivated by the following lockdep warning: ====================================================== [ INFO: possible circular locking dependency detected ] 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: G W ------------------------------------------------------- httpd/1597 is trying to acquire lock: (&ids->rwsem){+++++.}, at: shm_close+0x34/0x130 but task is already holding lock: (&mm->mmap_sem){++++++}, at: SyS_shmdt+0x4b/0x180 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&mm->mmap_sem){++++++}: lock_acquire+0xc7/0x270 __might_fault+0x7a/0xa0 filldir+0x9e/0x130 xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs] xfs_readdir+0x1b4/0x330 [xfs] xfs_file_readdir+0x2b/0x30 [xfs] iterate_dir+0x97/0x130 SyS_getdents+0x91/0x120 entry_SYSCALL_64_fastpath+0x12/0x76 -> #2 (&xfs_dir_ilock_class){++++.+}: lock_acquire+0xc7/0x270 down_read_nested+0x57/0xa0 xfs_ilock+0x167/0x350 [xfs] xfs_ilock_attr_map_shared+0x38/0x50 [xfs] xfs_attr_get+0xbd/0x190 [xfs] xfs_xattr_get+0x3d/0x70 [xfs] generic_getxattr+0x4f/0x70 inode_doinit_with_dentry+0x162/0x670 sb_finish_set_opts+0xd9/0x230 selinux_set_mnt_opts+0x35c/0x660 superblock_doinit+0x77/0xf0 delayed_superblock_init+0x10/0x20 iterate_supers+0xb3/0x110 selinux_complete_init+0x2f/0x40 security_load_policy+0x103/0x600 sel_write_load+0xc1/0x750 __vfs_write+0x37/0x100 vfs_write+0xa9/0x1a0 SyS_write+0x58/0xd0 entry_SYSCALL_64_fastpath+0x12/0x76 ... Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Reported-by: Morten Stevens <mstevens@fedoraproject.org> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Paul Moore <paul@paul-moore.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
thehajime
pushed a commit
that referenced
this issue
Aug 14, 2015
It turns out that a PV domU also requires the "Xen PV" APIC driver. Otherwise, the flat driver is used and we get stuck in busy loops that never exit, such as in this stack trace: (gdb) target remote localhost:9999 Remote debugging using localhost:9999 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56 56 while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY) (gdb) bt #0 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56 #1 __default_send_IPI_shortcut (shortcut=<optimized out>, dest=<optimized out>, vector=<optimized out>) at ./arch/x86/include/asm/ipi.h:75 #2 apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54 #3 0xffffffff81011336 in arch_irq_work_raise () at arch/x86/kernel/irq_work.c:47 #4 0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at kernel/irq_work.c:100 #5 0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633 #6 0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>, fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:1778 #7 0xffffffff816010c8 in printk (fmt=<optimized out>) at kernel/printk/printk.c:1868 #8 0xffffffffc00013ea in ?? () #9 0x0000000000000000 in ?? () Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
thehajime
pushed a commit
that referenced
this issue
Sep 2, 2015
…o next/dt mvebu dt changes for v4.3 (part #3) - device tree part of the Dove PMU series - converting a new orion5x based platform to dt: Linkstation Mini * tag 'mvebu-dt-4.3-3' of git://git.infradead.org/linux-mvebu: ARM: dts: Convert Linkstation Mini to Device Tree ARM: dt: dove: add GPU power domain description ARM: dt: dove: add video decoder power domain description ARM: dt: dove: wire up RTC interrupt ARM: dt: Add PMU node, making PMU child devices childs of this node Signed-off-by: Olof Johansson <olof@lixom.net>
thehajime
pushed a commit
that referenced
this issue
Sep 2, 2015
Due to patch "libfc: Do not invoke the response handler after fc_exch_done()" (commit ID 7030fd6) the lport_recv() call in fc_exch_recv_req() is passed a dangling pointer. Avoid this by moving the fc_frame_free() call from fc_invoke_resp() to its callers. This patch fixes the following crash: general protection fault: 0000 [#3] PREEMPT SMP RIP: fc_lport_recv_req+0x72/0x280 [libfc] Call Trace: fc_exch_recv+0x642/0xde0 [libfc] fcoe_percpu_receive_thread+0x46a/0x5ed [fcoe] kthread+0x10a/0x120 ret_from_fork+0x42/0x70 Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Vasu Dev <vasu.dev@intel.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
thehajime
pushed a commit
that referenced
this issue
Sep 2, 2015
A recent change to the cpu_cooling code introduced a AB-BA deadlock scenario between the cpufreq_policy_notifier_list rwsem and the cooling_cpufreq_lock. This is caused by cooling_cpufreq_lock being held before the registration/removal of the notifier block (an operation which takes the rwsem), and the notifier code itself which takes the locks in the reverse order: ====================================================== [ INFO: possible circular locking dependency detected ] 3.18.0+ #1453 Not tainted ------------------------------------------------------- rc.local/770 is trying to acquire lock: (cooling_cpufreq_lock){+.+.+.}, at: [<c04abfc4>] cpufreq_thermal_notifier+0x34/0xfc but task is already holding lock: ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [<c0042f04>] __blocking_notifier_call_chain+0x34/0x68 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 ((cpufreq_policy_notifier_list).rwsem){++++.+}: [<c06bc3b0>] down_write+0x44/0x9c [<c0043444>] blocking_notifier_chain_register+0x28/0xd8 [<c04ad610>] cpufreq_register_notifier+0x68/0x90 [<c04abe4c>] __cpufreq_cooling_register.part.1+0x120/0x180 [<c04abf44>] __cpufreq_cooling_register+0x98/0xa4 [<c04abf8c>] cpufreq_cooling_register+0x18/0x1c [<bf0046f8>] imx_thermal_probe+0x1c0/0x470 [imx_thermal] [<c037cef8>] platform_drv_probe+0x50/0xac [<c037b710>] driver_probe_device+0x114/0x234 [<c037b8cc>] __driver_attach+0x9c/0xa0 [<c0379d68>] bus_for_each_dev+0x5c/0x90 [<c037b204>] driver_attach+0x24/0x28 [<c037ae7c>] bus_add_driver+0xe0/0x1d8 [<c037c0cc>] driver_register+0x80/0xfc [<c037cd80>] __platform_driver_register+0x50/0x64 [<bf007018>] 0xbf007018 [<c0008a5c>] do_one_initcall+0x88/0x1d8 [<c0095da4>] load_module+0x1768/0x1ef8 [<c0096614>] SyS_init_module+0xe0/0xf4 [<c000ec00>] ret_fast_syscall+0x0/0x48 -> #0 (cooling_cpufreq_lock){+.+.+.}: [<c00619f8>] lock_acquire+0xb0/0x124 [<c06ba3b4>] mutex_lock_nested+0x5c/0x3d8 [<c04abfc4>] cpufreq_thermal_notifier+0x34/0xfc [<c0042bf4>] notifier_call_chain+0x4c/0x8c [<c0042f20>] __blocking_notifier_call_chain+0x50/0x68 [<c0042f58>] blocking_notifier_call_chain+0x20/0x28 [<c04ae62c>] cpufreq_set_policy+0x7c/0x1d0 [<c04af3cc>] store_scaling_governor+0x74/0x9c [<c04ad418>] store+0x90/0xc0 [<c0175384>] sysfs_kf_write+0x54/0x58 [<c01746b4>] kernfs_fop_write+0xdc/0x190 [<c010dcc0>] vfs_write+0xac/0x1b4 [<c010dfec>] SyS_write+0x44/0x90 [<c000ec00>] ret_fast_syscall+0x0/0x48 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((cpufreq_policy_notifier_list).rwsem); lock(cooling_cpufreq_lock); lock((cpufreq_policy_notifier_list).rwsem); lock(cooling_cpufreq_lock); *** DEADLOCK *** 7 locks held by rc.local/770: #0: (sb_writers#6){.+.+.+}, at: [<c010dda0>] vfs_write+0x18c/0x1b4 #1: (&of->mutex){+.+.+.}, at: [<c0174678>] kernfs_fop_write+0xa0/0x190 #2: (s_active#52){.+.+.+}, at: [<c0174680>] kernfs_fop_write+0xa8/0x190 #3: (cpu_hotplug.lock){++++++}, at: [<c0026a60>] get_online_cpus+0x34/0x90 #4: (cpufreq_rwsem){.+.+.+}, at: [<c04ad3e0>] store+0x58/0xc0 #5: (&policy->rwsem){+.+.+.}, at: [<c04ad3f8>] store+0x70/0xc0 #6: ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [<c0042f04>] __blocking_notifier_call_chain+0x34/0x68 stack backtrace: CPU: 0 PID: 770 Comm: rc.local Not tainted 3.18.0+ #1453 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) Backtrace: [<c00121c8>] (dump_backtrace) from [<c0012360>] (show_stack+0x18/0x1c) r6:c0b85a80 r5:c0b75630 r4:00000000 r3:00000000 [<c0012348>] (show_stack) from [<c06b6c48>] (dump_stack+0x7c/0x98) [<c06b6bcc>] (dump_stack) from [<c06b42a4>] (print_circular_bug+0x28c/0x2d8) r4:c0b85a80 r3:d0071d40 [<c06b4018>] (print_circular_bug) from [<c00613b0>] (__lock_acquire+0x1acc/0x1bb0) r10:c0b50660 r8:c09e6d80 r7:d0071d40 r6:c11d0f0c r5:00000007 r4:d0072240 [<c005f8e4>] (__lock_acquire) from [<c00619f8>] (lock_acquire+0xb0/0x124) r10:00000000 r9:c04abfc4 r8:00000000 r7:00000000 r6:00000000 r5:c0a06f0c r4:00000000 [<c0061948>] (lock_acquire) from [<c06ba3b4>] (mutex_lock_nested+0x5c/0x3d8) r10:ec853800 r9:c0a06ed4 r8:d0071d40 r7:c0a06ed4 r6:c11d0f0c r5:00000000 r4:c04abfc4 [<c06ba358>] (mutex_lock_nested) from [<c04abfc4>] (cpufreq_thermal_notifier+0x34/0xfc) r10:ec853800 r9:ec85380c r8:d00d7d3c r7:c0a06ed4 r6:d00d7d3c r5:00000000 r4:fffffffe [<c04abf90>] (cpufreq_thermal_notifier) from [<c0042bf4>] (notifier_call_chain+0x4c/0x8c) r7:00000000 r6:00000000 r5:00000000 r4:fffffffe [<c0042ba8>] (notifier_call_chain) from [<c0042f20>] (__blocking_notifier_call_chain+0x50/0x68) r8:c0a072a4 r7:00000000 r6:d00d7d3c r5:ffffffff r4:c0a06fc8 r3:ffffffff [<c0042ed0>] (__blocking_notifier_call_chain) from [<c0042f58>] (blocking_notifier_call_chain+0x20/0x28) r7:ec98b540 r6:c13ebc80 r5:ed76e600 r4:d00d7d3c [<c0042f38>] (blocking_notifier_call_chain) from [<c04ae62c>] (cpufreq_set_policy+0x7c/0x1d0) [<c04ae5b0>] (cpufreq_set_policy) from [<c04af3cc>] (store_scaling_governor+0x74/0x9c) r7:ec98b540 r6:0000000c r5:ec98b540 r4:ed76e600 [<c04af358>] (store_scaling_governor) from [<c04ad418>] (store+0x90/0xc0) r6:0000000c r5:ed76e6d4 r4:ed76e600 [<c04ad388>] (store) from [<c0175384>] (sysfs_kf_write+0x54/0x58) r8:0000000c r7:d00d7f78 r6:ec98b540 r5:0000000c r4:ec853800 r3:0000000c [<c0175330>] (sysfs_kf_write) from [<c01746b4>] (kernfs_fop_write+0xdc/0x190) r6:ec98b540 r5:00000000 r4:00000000 r3:c0175330 [<c01745d8>] (kernfs_fop_write) from [<c010dcc0>] (vfs_write+0xac/0x1b4) r10:0162aa70 r9:d00d6000 r8:0000000c r7:d00d7f78 r6:0162aa70 r5:0000000c r4:eccde500 [<c010dc14>] (vfs_write) from [<c010dfec>] (SyS_write+0x44/0x90) r10:0162aa70 r8:0000000c r7:eccde500 r6:eccde500 r5:00000000 r4:00000000 [<c010dfa8>] (SyS_write) from [<c000ec00>] (ret_fast_syscall+0x0/0x48) r10:00000000 r8:c000edc4 r7:00000004 r6:000216cc r5:0000000c r4:0162aa70 Solve this by moving to finer grained locking - use one mutex to protect the cpufreq_dev_list as a whole, and a separate lock to ensure correct ordering of cpufreq notifier registration and removal. cooling_list_lock is taken within cooling_cpufreq_lock on (un)registration to preserve the behavior of the code, i.e. to atomically add/remove to the list and (un)register the notifier. Fixes: 2dcd851 ("thermal: cpu_cooling: Update always cpufreq policy with Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
thehajime
pushed a commit
that referenced
this issue
Sep 2, 2015
Hit the following splat testing VRF change for ipsec: [ 113.475692] =============================== [ 113.476194] [ INFO: suspicious RCU usage. ] [ 113.476667] 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED Not tainted [ 113.477545] ------------------------------- [ 113.478013] /work/monster-14/dsa/kernel.git/include/linux/rcupdate.h:568 Illegal context switch in RCU read-side critical section! [ 113.479288] [ 113.479288] other info that might help us debug this: [ 113.479288] [ 113.480207] [ 113.480207] rcu_scheduler_active = 1, debug_locks = 1 [ 113.480931] 2 locks held by setkey/6829: [ 113.481371] #0: (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [<ffffffff814e9887>] pfkey_sendmsg+0xfb/0x213 [ 113.482509] #1: (rcu_read_lock){......}, at: [<ffffffff814e767f>] rcu_read_lock+0x0/0x6e [ 113.483509] [ 113.483509] stack backtrace: [ 113.484041] CPU: 0 PID: 6829 Comm: setkey Not tainted 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED [ 113.485422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 [ 113.486845] 0000000000000001 ffff88001d4c7a98 ffffffff81518af2 ffffffff81086962 [ 113.487732] ffff88001d538480 ffff88001d4c7ac8 ffffffff8107ae75 ffffffff8180a154 [ 113.488628] 0000000000000b30 0000000000000000 00000000000000d0 ffff88001d4c7ad8 [ 113.489525] Call Trace: [ 113.489813] [<ffffffff81518af2>] dump_stack+0x4c/0x65 [ 113.490389] [<ffffffff81086962>] ? console_unlock+0x3d6/0x405 [ 113.491039] [<ffffffff8107ae75>] lockdep_rcu_suspicious+0xfa/0x103 [ 113.491735] [<ffffffff81064032>] rcu_preempt_sleep_check+0x45/0x47 [ 113.492442] [<ffffffff8106404d>] ___might_sleep+0x19/0x1c8 [ 113.493077] [<ffffffff81064268>] __might_sleep+0x6c/0x82 [ 113.493681] [<ffffffff81133190>] cache_alloc_debugcheck_before.isra.50+0x1d/0x24 [ 113.494508] [<ffffffff81134876>] kmem_cache_alloc+0x31/0x18f [ 113.495149] [<ffffffff814012b5>] skb_clone+0x64/0x80 [ 113.495712] [<ffffffff814e6f71>] pfkey_broadcast_one+0x3d/0xff [ 113.496380] [<ffffffff814e7b84>] pfkey_broadcast+0xb5/0x11e [ 113.497024] [<ffffffff814e82d1>] pfkey_register+0x191/0x1b1 [ 113.497653] [<ffffffff814e9770>] pfkey_process+0x162/0x17e [ 113.498274] [<ffffffff814e9895>] pfkey_sendmsg+0x109/0x213 In pfkey_sendmsg the net mutex is taken and then pfkey_broadcast takes the RCU lock. Since pfkey_broadcast takes the RCU lock the allocation argument is pointless since GFP_ATOMIC must be used between the rcu_read_{,un}lock. The one call outside of rcu can be done with GFP_KERNEL. Fixes: 7f6b9db ("af_key: locking change") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
thehajime
pushed a commit
that referenced
this issue
Jan 25, 2016
When a43eec3 ("bpf: introduce bpf_perf_event_output() helper") added PERF_COUNT_SW_BPF_OUTPUT we ended up with a new entry in the event_symbols_sw array that wasn't initialized, thus set to NULL, fix print_symbol_events() to check for that case so that we don't crash if this happens again. (gdb) bt #0 __match_glob (ignore_space=false, pat=<optimized out>, str=<optimized out>) at util/string.c:198 #1 strglobmatch (str=<optimized out>, pat=pat@entry=0x7fffffffe61d "stall") at util/string.c:252 #2 0x00000000004993a5 in print_symbol_events (type=1, syms=0x872880 <event_symbols_sw+160>, max=11, name_only=false, event_glob=0x7fffffffe61d "stall") at util/parse-events.c:1615 #3 print_events (event_glob=event_glob@entry=0x7fffffffe61d "stall", name_only=false) at util/parse-events.c:1675 #4 0x000000000042c79e in cmd_list (argc=1, argv=0x7fffffffe390, prefix=<optimized out>) at builtin-list.c:68 #5 0x00000000004788a5 in run_builtin (p=p@entry=0x871758 <commands+120>, argc=argc@entry=2, argv=argv@entry=0x7fffffffe390) at perf.c:370 #6 0x0000000000420ab0 in handle_internal_command (argv=0x7fffffffe390, argc=2) at perf.c:429 #7 run_argv (argv=0x7fffffffe110, argcp=0x7fffffffe11c) at perf.c:473 #8 main (argc=2, argv=0x7fffffffe390) at perf.c:588 (gdb) p event_symbols_sw[PERF_COUNT_SW_BPF_OUTPUT] $4 = {symbol = 0x0, alias = 0x0} (gdb) A patch to robustify perf to not segfault when the next counter gets added in the kernel will follow this one. Reported-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-57wysblcjfrseb0zg5u7ek10@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
thehajime
pushed a commit
that referenced
this issue
Feb 2, 2016
The code produces the following trace: [1750924.419007] general protection fault: 0000 [#3] SMP [1750924.420364] Modules linked in: nfnetlink autofs4 rpcsec_gss_krb5 nfsv4 dcdbas rfcomm bnep bluetooth nfsd auth_rpcgss nfs_acl dm_multipath nfs lockd scsi_dh sunrpc fscache radeon ttm drm_kms_helper drm serio_raw parport_pc ppdev i2c_algo_bit lpc_ich ipmi_si ib_mthca ib_qib dca lp parport ib_ipoib mac_hid ib_cm i3000_edac ib_sa ib_uverbs edac_core ib_umad ib_mad ib_core ib_addr tg3 ptp dm_mirror dm_region_hash dm_log psmouse pps_core [1750924.420364] CPU: 1 PID: 8401 Comm: python Tainted: G D 3.13.0-39-generic #66-Ubuntu [1750924.420364] Hardware name: Dell Computer Corporation PowerEdge 860/0XM089, BIOS A04 07/24/2007 [1750924.420364] task: ffff8800366a9800 ti: ffff88007af1c000 task.ti: ffff88007af1c000 [1750924.420364] RIP: 0010:[<ffffffffa0131d51>] [<ffffffffa0131d51>] qib_mcast_qp_free+0x11/0x50 [ib_qib] [1750924.420364] RSP: 0018:ffff88007af1dd70 EFLAGS: 00010246 [1750924.420364] RAX: 0000000000000001 RBX: ffff88007b822688 RCX: 000000000000000f [1750924.420364] RDX: ffff88007b822688 RSI: ffff8800366c15a0 RDI: 6764697200000000 [1750924.420364] RBP: ffff88007af1dd78 R08: 0000000000000001 R09: 0000000000000000 [1750924.420364] R10: 0000000000000011 R11: 0000000000000246 R12: ffff88007baa1d98 [1750924.420364] R13: ffff88003ecab000 R14: ffff88007b822660 R15: 0000000000000000 [1750924.420364] FS: 00007ffff7fd8740(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000 [1750924.420364] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1750924.420364] CR2: 00007ffff597c750 CR3: 000000006860b000 CR4: 00000000000007e0 [1750924.420364] Stack: [1750924.420364] ffff88007b822688 ffff88007af1ddf0 ffffffffa0132429 000000007af1de20 [1750924.420364] ffff88007baa1dc8 ffff88007baa0000 ffff88007af1de70 ffffffffa00cb313 [1750924.420364] 00007fffffffde88 0000000000000000 0000000000000008 ffff88003ecab000 [1750924.420364] Call Trace: [1750924.420364] [<ffffffffa0132429>] qib_multicast_detach+0x1e9/0x350 [ib_qib] [1750924.568035] [<ffffffffa00cb313>] ? ib_uverbs_modify_qp+0x323/0x3d0 [ib_uverbs] [1750924.568035] [<ffffffffa0092d61>] ib_detach_mcast+0x31/0x50 [ib_core] [1750924.568035] [<ffffffffa00cc213>] ib_uverbs_detach_mcast+0x93/0x170 [ib_uverbs] [1750924.568035] [<ffffffffa00c61f6>] ib_uverbs_write+0xc6/0x2c0 [ib_uverbs] [1750924.568035] [<ffffffff81312e68>] ? apparmor_file_permission+0x18/0x20 [1750924.568035] [<ffffffff812d4cd3>] ? security_file_permission+0x23/0xa0 [1750924.568035] [<ffffffff811bd214>] vfs_write+0xb4/0x1f0 [1750924.568035] [<ffffffff811bdc49>] SyS_write+0x49/0xa0 [1750924.568035] [<ffffffff8172f7ed>] system_call_fastpath+0x1a/0x1f [1750924.568035] Code: 66 2e 0f 1f 84 00 00 00 00 00 31 c0 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8b 7f 10 <f0> ff 8f 40 01 00 00 74 0e 48 89 df e8 8e f8 06 e1 5b 5d c3 0f [1750924.568035] RIP [<ffffffffa0131d51>] qib_mcast_qp_free+0x11/0x50 [ib_qib] [1750924.568035] RSP <ffff88007af1dd70> [1750924.650439] ---[ end trace 73d5d4b3f8ad4851 ] The fix is to note the qib_mcast_qp that was found. If none is found, then return EINVAL indicating the error. Cc: <stable@vger.kernel.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
thehajime
pushed a commit
that referenced
this issue
Feb 2, 2016
This crash is caused by NULL pointer deference, in page_to_pfn() marco, when page == NULL : Unable to handle kernel NULL pointer dereference at virtual address 00000000 Internal error: Oops: 94000006 [#1] SMP Modules linked in: CPU: 1 PID: 26 Comm: khugepaged Tainted: G W 4.3.0-rc6-next-20151022ajb-00001-g32f3386-dirty #3 PC is at khugepaged+0x378/0x1af8 LR is at khugepaged+0x418/0x1af8 Process khugepaged (pid: 26, stack limit = 0xffffffc079638020) Call trace: khugepaged+0x378/0x1af8 kthread+0xdc/0xf4 ret_from_fork+0xc/0x40 Code: 35001700 f0002c60 aa0703e3 f9009fa0 (f94000e0) ---[ end trace 637503d8e28ae69e ]--- Kernel panic - not syncing: Fatal exception CPU2: stopping CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D W 4.3.0-rc6-next-20151022ajb-00001-g32f3386-dirty #3 Hardware name: linux,dummy-virt (DT) [akpm@linux-foundation.org: fix fat-fingered merge resolution] Signed-off-by: yalin wang <yalin.wang2010@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
No description provided.
The text was updated successfully, but these errors were encountered: