Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oneplus/6.0.1 - "rmt_storage" #10

Open
wants to merge 2,552 commits into
base: 5.1.1
Choose a base branch
from
Open

Oneplus/6.0.1 - "rmt_storage" #10

wants to merge 2,552 commits into from

Conversation

gps3dx
Copy link

@gps3dx gps3dx commented Jun 5, 2016

Hello
Both OPO and OPX got the source code for "rmt_storage" - but it is missing for the OPT.
Please release publicly "rmt_storage" relevant files ( maybe rmt_storage_client.h ?)
Much oblige.

Ashray Kulkarni and others added 30 commits September 8, 2015 19:47
During power collapse there is a race condition where power thread is
trying to collapse venus and other thread is trying to map a buffer.
This could lead to a iommu failure and crash as during power collapse
we detach iommu, and mapping buffers need iommu to be attached. This
change cancels power collapse thread if not scheduled and if executing,
then will wait for the completion of the thread and then we attach
iommu before calling iommu map.

Change-Id: I9d294827e35d14b828715006a3b50a162d620995
Signed-off-by: Ashray Kulkarni <ashrayk@codeaurora.org>
The last place that confirms no event from response handler
thread is after session_clean. Hence exit the event_queue
after session_clean.

CRs-Fixed: 900211
Change-Id: I5978332f2a423d074a9631279fc4827b43b2e695
Signed-off-by: Praneeth Paladugu <ppaladug@codeaurora.org>
IRQs must be disabled while locking runqueues since an
interrupt may cause a runqueue lock to be acquired.

CRs-fixed: 828598
Change-Id: I08490607c6982451f7f240fb1166edacc23f3f52
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Add MultiMedia17 dai link for surround sound recording.

Change-Id: I0b98dc5bccac1c232a6153e42665245c07d018e7
Signed-off-by: Bhalchandra Gajare <gajare@codeaurora.org>
Codec specific  metadata is sent only for first stream in gapless
playback. This causes incorrect configuration to be set for second
stream and distortions are observed due to framedrops in adsp.
Add support to send codec specific format during start of
next stream in gapless.
Add bit rate to wma codec data structure as it can vary between
streams in gapless.

Change-Id: I39f34ea1addff720612fe3e06257e7d75889e574
Signed-off-by: Chaithanya Krishna Bacharaju <chaithan@codeaurora.org>
Signed-off-by: Fred Oh <fred@codeaurora.org>
Signed-off-by: Alexy Joseph <alexyj@codeaurora.org>
If there is a packet in flight during ssr, send failure and
handle destruction can happen simultaneously. Although, the
handle is freed only after the transactions lists are empty,
the list check and wakeup happen without any locks held. This
may lead to a free after use crash when wakeup is triggered
after handle is freed.

Invoke wakeup and list check with appropriate lock being held.

CRs-Fixed: 899087
Change-Id: I38306cfe12e0c7ffb1f54b091a23cecb487ae9a0
Signed-off-by: Atish Kumar Patra <apatra@codeaurora.org>
Do not unload kmota TA when a process that loaded it
is killed. qseecomd loads the TA, but keystore
and gatekeeper reference it

Change-Id: Ice00b520c110f743b9b281cfc4598aede0251562
Acked-by: Barani Muthukumaran <bmuthuku@qti.qualcomm.com>
Signed-off-by: Zhen Kong <zkong@codeaurora.org>
.compat_ioctl32 is used to support custom ioctls for 64 bit kernel
and 32 bit userspace. Since vidc driver doesn't have any custom
controls, remove the definition.

When userspace calls some fuzzy ioctls, V4L2 framework calls
compat_ioctl32 to corresponding driver from v4l2_compat_ioctl32.
If vidc driver defines compat_ioctl32 to be v4l2_compat_ioctl32
then v4l2_compat_ioctl32 function gets calls infinitely.
Hence remove the definition so that V4L2 framework returns failure
to ioctl calle.

Change-Id: I8d0e7afb14c4fd16e877145c603283cb936b82e2
Signed-off-by: Praneeth Paladugu <ppaladug@codeaurora.org>
Add new NCI driver with support for NQxxxx controller.

This is a character device driver. It enables control of the
NQx NFC controller using the i2c bus and GPIOs.

Change-Id: Ide7ac51e4493c50127e75624006d359ae8b55981
Acked-by: Afroditi Ilioudi <ailioudi@qti.qualcomm.com>
Signed-off-by: Puneet Mishra <puneetm@codeaurora.org>
The charger sets the POWER_SUPPLY_PROP_CHARGE_DONE on the bms power
supply during the chg_term interrupt handler function regardless of
whether charging actually terminated. In the cases of an instant
recharge trigger or manually calling the interrupt handler, the
CHARGE_DONE property will be falsely set.

Fix this by checking the real time status bit of the termination
interrupt before setting the charge_done property on the bms power
supply.

CRs-Fixed: 899712
Change-Id: I27e5969e462caba644e58095d6885e3b7e3c4523
Signed-off-by: Xiaozhe Shi <xiaozhes@codeaurora.org>
Currently maximum supported instances check is in Venus.
When instances count is more than supported, FW reports
session_error for the new session. This is different than
normal session_error. Driver needs to handle this error
differently. Instead of this, driver can reject then new
session if the new instance is more than supported count.
This makes error handling easy.

Change-Id: I283971b73286d3e3cb97ec1e1ce3811dda32b740
Signed-off-by: Praneeth Paladugu <ppaladug@codeaurora.org>
The TSB block registers are going to be protected by TZ. This
means they can no longer be accessed through CPU, as it will cause
XPU violation. Therefore, configure those registers through GPU
instead of CPU. The choice between CPU and GPU is dynamically
implemented in order to avoid hard dependency on TZ.

CRs-Fixed: 817449
Change-Id: Iffd443781632250afc17a795bdfaa475756bca8b
Signed-off-by: Harshdeep Dhatt <hdhatt@codeaurora.org>
Currently the service event notification is done based on the number of
services of <service:instance> type available. If a service of a specific
type exits and registers in quick succession, then the service event
notification gets suppressed without client's knowledge. The client then
tries to communicate with a stale/non-existent service.

Fix the race condition by maintaining a list of port addresses of all
available services.

Change-Id: I7f626c4bb2d3429b1b277d802e23e247100c371a
Signed-off-by: Atish Kumar Patra <apatra@codeaurora.org>
During concurrency to headphone or ear playback, some
of the class-H parameters are not being set which
would cause mute. Configure class-H parameters correctly
during hph+ear concurrency.

Change-Id: Ib27eda1166c2177d24a04574e3df9c3f64b9dec9
Signed-off-by: Phani Kumar Uppalapati <phaniu@codeaurora.org>
Linux Build Service Account and others added 11 commits January 19, 2016 02:03
…ranch

Change-Id: I0ed8f80a5c690fcb7b0b6fe5c2c765f34f700972
In AndroidKernel.mk the 2 targets TARGET_PREBUILT_INT_KERNEL
and KERNEL_HEADERS_INSTALL both depend on and modify the same
directory. Both targets always need to be rebuilt.
This patch solves this problem cutting down rebuilding time

Change-Id: I226ae9eae59c245406e573aced6362ba8af8800c
Signed-off-by: Mishra Mahima <mahima@codeaurora.org>
…ranch

Change-Id: Ia4270e23d8ae13630e14e35188660f79175e5e50
and KERNEL_HEADERS_INSTALL both depend on and modify the same
directory. Both targets always need to be rebuilt.
This patch solves this problem cutting down rebuilding time

Change-Id: I226ae9eae59c245406e573aced6362ba8af8800c
Signed-off-by: Mishra Mahima <mahima@codeaurora.org>
…ranch

Change-Id: Ib3618a4bf2769817a1be21ce9ee57b8db2b2a340
@leonfish77101
Copy link

Sorry @gps3dx, rmt_storage is belong to Qualcomm, we have no right to release it publicly.

@gps3dx
Copy link
Author

gps3dx commented Jun 21, 2016

@leonfish77101 - thank you very much for that clarification.
I'm not so sure at all that the OPT community really need that file's sources.
@xda community( link beneath ), many OPT A2005 (US) owners looking for a way to enable LTE band 3 ( 1800Mhz ), but with NO success so far.

Personally, I leave in a country where only the LTE band 3 exist ( the rest is 3G & 2G ), but because I own the US ( A2005 ) model, I cannot utilize it AT ALL.

Can OnePlus please offer a way to enable LTE band 3 on the OPT A2005 ?
( MORE ABOUT IT ON XDA: http://forum.xda-developers.com/oneplus-2/help/unlock-fdd-lte-band-3-1800mhz-oneplus-2-t3206369 )
AFAIK ( you're welcome to correct me ) OPT A2005 HW supports LTE band 3, but it is disabled through SW configuration such as NV items and/or other ways like the "static_nvbk.bin" ( i.e the "/dev/block/bootdevice/by-name/oem_stanvbk" partition ).

Any help in that matter will be much appreciated.
Thanks,
gps3dx ( on xda as well ).

@leonfish77101
Copy link

@gps3dx A2005 doesn't support band 3, and these bands are A2005 support: B1,B2,B4,B7,B5,B8,B12,B17.
I think one thing need to be clarify, the hardware of US Version is not the same like the European version. More differences on official site: https://oneplus.net/2/specs

@gps3dx
Copy link
Author

gps3dx commented Jun 22, 2016

@leonfish77101 - thanks again for the reply.

AFAIK, OPO US isn't the same as OPO EU(international) - it's obvious from your official site for the OPO.
nonetheless, IS WAS PROVEN, beyond doubt, that LTE B3 CAN BE UNLOCKED for the OPO US model. PROOF:http://forum.xda-developers.com/showpost.php?p=56577252&postcount=455

Can you please go into details about why isn't it possible to unlock LTE B3 on the OPT A2005, even though you publicly state that OPT A2005 supports FDD-LTE at the HW level ?

ONEPLUS aid into the matter at hand, even if it will be provided WITHOUT any warranty, might only increase OPT sales around the world, and increase ONEPLUS name as a company that listens to their customer's requests.

@leonfish77101
Copy link

@gps3dx OPT's antenna is different between EU and US, and that is why LTE B3 can not be enabled at A2005

@bulju
Copy link

bulju commented Jun 23, 2016

@leonfish77101 What bands are supported by OPT's antenna in A2003? I leave in Argentina and I have an A2003, and in Argentina LTE works with band 4 and 28.
Thanks you
Regards

BobZmotion pushed a commit to BobZmotion/android_kernel_oneplus_msm8994 that referenced this pull request Jul 2, 2016
…antiated

commit 3c2e226 upstream.

arm:pxa_defconfig can result in the following crash if the max1111 driver
is not instantiated.

Unhandled fault: page domain fault (0x01b) at 0x00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: : 1b [OnePlusOSS#1] PREEMPT ARM
Modules linked in:
CPU: 0 PID: 300 Comm: kworker/0:1 Not tainted 4.5.0-01301-g1701f680407c OnePlusOSS#10
Hardware name: SHARP Akita
Workqueue: events sharpsl_charge_toggle
task: c390a000 ti: c391e000 task.ti: c391e000
PC is at max1111_read_channel+0x20/0x30
LR is at sharpsl_pm_pxa_read_max1111+0x2c/0x3c
pc : [<c03aaab0>]    lr : [<c0024b50>]    psr: 20000013
...
[<c03aaab0>] (max1111_read_channel) from [<c0024b50>]
					(sharpsl_pm_pxa_read_max1111+0x2c/0x3c)
[<c0024b50>] (sharpsl_pm_pxa_read_max1111) from [<c00262e0>]
					(spitzpm_read_devdata+0x5c/0xc4)
[<c00262e0>] (spitzpm_read_devdata) from [<c0024094>]
					(sharpsl_check_battery_temp+0x78/0x110)
[<c0024094>] (sharpsl_check_battery_temp) from [<c0024f9c>]
					(sharpsl_charge_toggle+0x48/0x110)
[<c0024f9c>] (sharpsl_charge_toggle) from [<c004429c>]
					(process_one_work+0x14c/0x48c)
[<c004429c>] (process_one_work) from [<c0044618>] (worker_thread+0x3c/0x5d4)
[<c0044618>] (worker_thread) from [<c004a238>] (kthread+0xd0/0xec)
[<c004a238>] (kthread) from [<c000a670>] (ret_from_fork+0x14/0x24)

This can occur because the SPI controller driver (SPI_PXA2XX) is built as
module and thus not necessarily loaded. While building SPI_PXA2XX into the
kernel would make the problem disappear, it appears prudent to ensure that
the driver is instantiated before accessing its data structures.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: stable@vger.kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Altaf-Mahdi pushed a commit to Altaf-Mahdi/android_kernel_oneplus_msm8994 that referenced this pull request Jul 8, 2016
commit ecf5fc6 upstream.

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  Grarak#1 schedule at ffffffff815ab76e
  Grarak#2 schedule_timeout at ffffffff815ae5e5
  Grarak#3 io_schedule_timeout at ffffffff815aad6a
  OnePlusOSS#4 bit_wait_io at ffffffff815abfc6
  OnePlusOSS#5 __wait_on_bit at ffffffff815abda5
  OnePlusOSS#6 wait_on_page_bit at ffffffff8111fd4f
  OnePlusOSS#7 shrink_page_list at ffffffff81135445
  OnePlusOSS#8 shrink_inactive_list at ffffffff81135845
  OnePlusOSS#9 shrink_lruvec at ffffffff81135ead
 OnePlusOSS#10 shrink_zone at ffffffff811360c3
 OnePlusOSS#11 shrink_zones at ffffffff81136eff
 OnePlusOSS#12 do_try_to_free_pages at ffffffff8113712f
 OnePlusOSS#13 try_to_free_mem_cgroup_pages at ffffffff811372be
 OnePlusOSS#14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Altaf-Mahdi pushed a commit to Altaf-Mahdi/android_kernel_oneplus_msm8994 that referenced this pull request Jul 8, 2016
…antiated

commit 3c2e226 upstream.

arm:pxa_defconfig can result in the following crash if the max1111 driver
is not instantiated.

Unhandled fault: page domain fault (0x01b) at 0x00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: : 1b [Grarak#1] PREEMPT ARM
Modules linked in:
CPU: 0 PID: 300 Comm: kworker/0:1 Not tainted 4.5.0-01301-g1701f680407c OnePlusOSS#10
Hardware name: SHARP Akita
Workqueue: events sharpsl_charge_toggle
task: c390a000 ti: c391e000 task.ti: c391e000
PC is at max1111_read_channel+0x20/0x30
LR is at sharpsl_pm_pxa_read_max1111+0x2c/0x3c
pc : [<c03aaab0>]    lr : [<c0024b50>]    psr: 20000013
...
[<c03aaab0>] (max1111_read_channel) from [<c0024b50>]
					(sharpsl_pm_pxa_read_max1111+0x2c/0x3c)
[<c0024b50>] (sharpsl_pm_pxa_read_max1111) from [<c00262e0>]
					(spitzpm_read_devdata+0x5c/0xc4)
[<c00262e0>] (spitzpm_read_devdata) from [<c0024094>]
					(sharpsl_check_battery_temp+0x78/0x110)
[<c0024094>] (sharpsl_check_battery_temp) from [<c0024f9c>]
					(sharpsl_charge_toggle+0x48/0x110)
[<c0024f9c>] (sharpsl_charge_toggle) from [<c004429c>]
					(process_one_work+0x14c/0x48c)
[<c004429c>] (process_one_work) from [<c0044618>] (worker_thread+0x3c/0x5d4)
[<c0044618>] (worker_thread) from [<c004a238>] (kthread+0xd0/0xec)
[<c004a238>] (kthread) from [<c000a670>] (ret_from_fork+0x14/0x24)

This can occur because the SPI controller driver (SPI_PXA2XX) is built as
module and thus not necessarily loaded. While building SPI_PXA2XX into the
kernel would make the problem disappear, it appears prudent to ensure that
the driver is instantiated before accessing its data structures.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: stable@vger.kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
@xdevs23
Copy link

xdevs23 commented Aug 14, 2016

North America model
WCDMA: Bands 1/2/4/5/8 
FDD-LTE: Bands 1/2/4/5/7/8/12/17

@regalstreak
Copy link

Of course it is different.

@xdevs23
Copy link

xdevs23 commented Aug 16, 2016

Lol 😂

@bulju
Copy link

bulju commented Aug 16, 2016

@xdevs23 @regalstreak Yes, by software are disabled, but my question is for hardware compatibility. If you don't know, please, don't comment with spam. Thanks.

@leonfish77101
Copy link

@bulju The hardware is different

leonfish77101 and others added 3 commits September 9, 2016 18:05
Change-Id: Idea4c537f1625200bdd52efda6e8e00182b9f940
Signed-off-by: jiachunxu <jiachunxu@oneplus.cn>
Change-Id: I48f161416bc241810369e5e4efeb88f96f567fb5
The changes list:
1.Recording 4K video capture process caused by the phone into the crash
2.Fix some stability issue and crash issue
3.add some secure patch

Change-Id: I1d03e03d42b95c14ebe7178da659e5dabef1cc68
CarbonGerritBot pushed a commit to CarbonROM/android_kernel_oneplus_msm8994 that referenced this pull request Jun 11, 2017
…antiated

commit 3c2e226 upstream.

arm:pxa_defconfig can result in the following crash if the max1111 driver
is not instantiated.

Unhandled fault: page domain fault (0x01b) at 0x00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: : 1b [Grarak#1] PREEMPT ARM
Modules linked in:
CPU: 0 PID: 300 Comm: kworker/0:1 Not tainted 4.5.0-01301-g1701f680407c OnePlusOSS#10
Hardware name: SHARP Akita
Workqueue: events sharpsl_charge_toggle
task: c390a000 ti: c391e000 task.ti: c391e000
PC is at max1111_read_channel+0x20/0x30
LR is at sharpsl_pm_pxa_read_max1111+0x2c/0x3c
pc : [<c03aaab0>]    lr : [<c0024b50>]    psr: 20000013
...
[<c03aaab0>] (max1111_read_channel) from [<c0024b50>]
					(sharpsl_pm_pxa_read_max1111+0x2c/0x3c)
[<c0024b50>] (sharpsl_pm_pxa_read_max1111) from [<c00262e0>]
					(spitzpm_read_devdata+0x5c/0xc4)
[<c00262e0>] (spitzpm_read_devdata) from [<c0024094>]
					(sharpsl_check_battery_temp+0x78/0x110)
[<c0024094>] (sharpsl_check_battery_temp) from [<c0024f9c>]
					(sharpsl_charge_toggle+0x48/0x110)
[<c0024f9c>] (sharpsl_charge_toggle) from [<c004429c>]
					(process_one_work+0x14c/0x48c)
[<c004429c>] (process_one_work) from [<c0044618>] (worker_thread+0x3c/0x5d4)
[<c0044618>] (worker_thread) from [<c004a238>] (kthread+0xd0/0xec)
[<c004a238>] (kthread) from [<c000a670>] (ret_from_fork+0x14/0x24)

This can occur because the SPI controller driver (SPI_PXA2XX) is built as
module and thus not necessarily loaded. While building SPI_PXA2XX into the
kernel would make the problem disappear, it appears prudent to ensure that
the driver is instantiated before accessing its data structures.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: stable@vger.kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
CarbonGerritBot pushed a commit to CarbonROM/android_kernel_oneplus_msm8994 that referenced this pull request Jun 18, 2017
commit 45caeaa upstream.

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 OnePlusOSS#8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 OnePlusOSS#9 [] tcp_rcv_established at ffffffff81580b64
OnePlusOSS#10 [] tcp_v4_do_rcv at ffffffff8158b54a
OnePlusOSS#11 [] tcp_v4_rcv at ffffffff8158cd02
OnePlusOSS#12 [] ip_local_deliver_finish at ffffffff815668f4
OnePlusOSS#13 [] ip_local_deliver at ffffffff81566bd9
OnePlusOSS#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)�
 225 {�
 226 �       const struct inet_connection_sock *icsk = inet_csk(sk);�
 227 �       const struct dst_entry *dst = __sk_dst_get(sk);�
 228 �
 229 �       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||�
 230 �       �       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);�
 231 }�

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.