Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Timer]: Spurious State Change of SNMP.timer handled #1

Closed
wants to merge 2 commits into from

Conversation

vivekrnv
Copy link
Owner

@vivekrnv vivekrnv commented Mar 16, 2021

Signed-off-by: Vivek Reddy Karri vivekreddykarri98@gmail.com

Why I did it

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Vivek Reddy Karri <vivekreddykarri98@gmail.com>
@dgsudharsan dgsudharsan self-requested a review March 16, 2021 00:28
Signed-off-by: Vivek Reddy Karri <vivekreddykarri98@gmail.com>
@liat-grozovik
Copy link

@vivekreddynv can you please provide more information of the motivation. if this is required for 202012 please check it can be cleanly cherry picked to 202012 and have it marked so on the public PR i will be able to add label and ask the release maintainer to cherry pick

@vivekrnv vivekrnv changed the title [Timer]: Spurious State Change of SNMP.timer fixed [Timer]: Spurious State Change of SNMP.timer handled Mar 16, 2021
@vivekrnv
Copy link
Owner Author

@vivekreddynv can you please provide more information of the motivation. if this is required for 202012 please check it can be cleanly cherry picked to 202012 and have it marked so on the public PR i will be able to add label and ask the release maintainer to cherry pick

Hi @liat-grozovik , Sorry for my bad wording but the issue is not fixed with this PR. The error still persists.

@vivekrnv vivekrnv closed this Mar 17, 2021
vivekrnv pushed a commit that referenced this pull request Nov 11, 2021
Allow mellanox platform to build and successfully switch packets in
Debian 11

Upgraded

* Mellanox SDK
* Mellanox Hardware Management
* Mellanox Firmware
* Mellanox Kernel Patches

Adjusted build system to support host system running bullseye and
dockers running buster.
vivekrnv pushed a commit that referenced this pull request Nov 11, 2021
* Make neccesary changed to mellanox platform code to build on Debian 11

* Revert use of backported kernel to build mft and elect to only build kernel module under bullseye
vivekrnv pushed a commit that referenced this pull request Dec 1, 2021
Submodule update for sonic-linkmgrd
Incorporates:

c11a576 (2021-11-22 09:38:46) [ci]: show code coverage in azure pipeline (#4)
4ceb01d (2021-11-18 20:24:20) Fix MUX toggling issue (#1)
d640527 (2021-11-12 22:31:44) [ci]: fix artifact download
b9f247d (2021-11-12 22:31:44) [ci]: use native arm64/armhf build
3059122 (2021-09-27 11:32:23) [linkgrd] Add Missing Apache License Header
vivekrnv pushed a commit that referenced this pull request Apr 4, 2022
Submodule update for sonic-linkmgrd
Incorporates:

c11a576 (2021-11-22 09:38:46) [ci]: show code coverage in azure pipeline (#4)
4ceb01d (2021-11-18 20:24:20) Fix MUX toggling issue (#1)
d640527 (2021-11-12 22:31:44) [ci]: fix artifact download
b9f247d (2021-11-12 22:31:44) [ci]: use native arm64/armhf build
3059122 (2021-09-27 11:32:23) [linkgrd] Add Missing Apache License Header

signed-off-by: Jing Zhang zhangjing@microsoft.com
vivekrnv pushed a commit that referenced this pull request May 10, 2022
…net#10291)

#### Why I did it

Fix issue: Non compliant leaf list in config_db schema: sonic-net#9801

#### How I did it

The basic flow of DPB is like:
1.	Transfer config db json value to YANG json value, name it “yangIn”
2.	Validate “yangIn” by libyang
3.	Generate a YANG json value to represent the target configuration, name it “yangTarget”
4.	Do diff between “yangIn” and “yangTarget”
5.	Apply the diff to CONFIG DB json and save it back to DB
 
The fix:
•	For step #1, If value of a leaf-list field string type, transfer it to a list by splitting it with “,” the purpose here is to make step#2 happy. We also need to save <table_name>.<key>.<field_name> to a set named “leaf_list_with_string_value_set”.
•	For step#5, loop “leaf_list_with_string_value_set” and change those fields back to a string.


#### How to verify it

1. Manual test
2. Changed sample config DB and unit test passed
vivekrnv pushed a commit that referenced this pull request May 26, 2022
…net#10291) (sonic-net#10768)

Fix issue: Non compliant leaf list in config_db schema: sonic-net#9801

The basic flow of DPB is like:
1.	Transfer config db json value to YANG json value, name it “yangIn”
2.	Validate “yangIn” by libyang
3.	Generate a YANG json value to represent the target configuration, name it “yangTarget”
4.	Do diff between “yangIn” and “yangTarget”
5.	Apply the diff to CONFIG DB json and save it back to DB

The fix:
•	For step #1, If value of a leaf-list field string type, transfer it to a list by splitting it with “,” the purpose here is to make step#2 happy. We also need to save <table_name>.<key>.<field_name> to a set named “leaf_list_with_string_value_set”.
•	For step#5, loop “leaf_list_with_string_value_set” and change those fields back to a string.

1. Manual test
2. Changed sample config DB and unit test passed

Conflicts:
	src/sonic-yang-mgmt/sonic_yang_ext.py
vivekrnv pushed a commit that referenced this pull request Aug 26, 2022
Currently, ASAN sometimes reports the BufferOrch::m_buffer_type_maps and QosOrch::m_qos_maps as leaked. However, their lifetime is the lifetime of a process so they are not really 'leaked'.
This also adds a simple way to add more suppressions later if required.

Example of ASAN report:

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x7f96aa952d30 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xead30)
    #1 0x55ca1da9f789 in __static_initialization_and_destruction_0 /__w/2/s/orchagent/bufferorch.cpp:39
    #2 0x55ca1daa02af in _GLOBAL__sub_I_bufferorch.cpp /__w/2/s/orchagent/bufferorch.cpp:1321
    #3 0x55ca1e2a9cd4  (/usr/bin/orchagent+0xe89cd4)

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x7f96aa952d30 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xead30)
    #1 0x55ca1da6d2da in __static_initialization_and_destruction_0 /__w/2/s/orchagent/qosorch.cpp:80
    #2 0x55ca1da6ecf2 in _GLOBAL__sub_I_qosorch.cpp /__w/2/s/orchagent/qosorch.cpp:2000
    #3 0x55ca1e2a9cd4  (/usr/bin/orchagent+0xe89cd4)

- What I did
Added an lsan suppression config with static variable leak suppression

- Why I did it
To suppress ASAN false positives

- How I verified it
Run a test that produces the static variable leaks report and checked that report has these leaks suppressed.

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
@vivekrnv vivekrnv deleted the timer_fix branch August 29, 2022 23:50
vivekrnv pushed a commit that referenced this pull request Feb 3, 2023
- Why I did it
To improve ASIC FW upgrade logging and have information about the cause of FW update failure in the log.

- How I did it
Added syslog logger support

In case the FW update has failed the update tool will give the cause of the failure in the output in the last line, starting with "Fail".
When running the tool, in case of a failed update, we will parse the output to retrieve the cause and log it.

Device #1:
 ----------
 
 Device Type:      ConnectX6DX
   Part Number:      MCX623106AN-CDA_Ax
   Description:      ConnectX-6 Dx EN adapter card; 100GbE; Dual-port QSFP56; PCIe 4.0/3.0 x16;
   PSID:             MT_0000000359
   PCI Device Name:  /dev/mst/mt4125_pciconf0
   Base GUID:        0c42a103007d22d4
   Base MAC:         0c42a17d22d4
   Versions:         Current        Available     
      FW             22.32.0498     22.32.0498    
      PXE            3.6.0500       3.6.0500      
      UEFI           14.25.0015     14.25.0015    
 
 Status:           Forced update required
 
---------
 Found 1 device(s) requiring firmware update...
 
Device #1: Updating FW ...     
 FSMST_INITIALIZE -   OK          
 Writing Boot image component -   OK          
 Fail : The Digest in the signature is wrong

- How to verify it
mlnx-fw-upgrade.sh --upgrade
vivekrnv pushed a commit that referenced this pull request Apr 28, 2023
Fix sonic-net#6866

Unset CONFIG_THERMAL_STATISTICS.
Reason:
Kernel thermal zones binding to the cooling device together with CONFIG_THERMAL_STATISTICS=y causes kernel crash as out of boundary:
trans_table is two-dimensional table allocated per max cooling state (10).
If statistics is configured, thermal_cooling_device_stats_update() will be called and will try to update out of boundary:
stats->trans_table[stats->state * stats->max_states + new_state]++

Kernel crash with the following stack trace:

```
[  269.474092] watchdog: watchdog1: watchdog did not stop!
[  269.533625] list_del corruption. prev->next should be ffff9e136bd57418, but was 677ac660ffffffff

[  269.543482] kernel BUG at lib/list_debug.c:53!
[  269.548458] invalid opcode: 0000 [#1] SMP PTI
[  269.553326] CPU: 1 PID: 8890 Comm: kexec Tainted: G           OE     4.19.0-9-2-amd64 #1 Debian 4.19.118-2+deb10u1
[  269.564891] Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 11/03/2020
[  269.574323] RIP: 0010:__list_del_entry_valid.cold.1+0x34/0x4c
[  269.580740] Code: 9f 29 a5 e8 68 7a d0 ff 0f 0b 48 c7 c7 20 a0 29 a5 e8 5a 7a d0 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 e0 9f 29 a5 e8 46 7a d0 ff <0f> 0b 48 89 fe 48 c7 c7 a8 9f 29 a5 e8 35 7a d0 ff 0f 0b 90 90 90
[  269.601726] RSP: 0018:ffffaddb83b5fdc0 EFLAGS: 00010246
[  269.607561] RAX: 0000000000000054 RBX: ffff9e136bd57418 RCX: 0000000000000000
[  269.615531] RDX: 0000000000000000 RSI: ffff9e136fa566b8 RDI: ffff9e136fa566b8
[  269.623500] RBP: ffff9e1364bd5070 R08: 00000000000005ce R09: 0000000000000004
[  269.631470] R10: 0000000000000766 R11: ffffffffa59f66ad R12: ffff9e136bd57400
[  269.639440] R13: ffffffffa52c6a12 R14: ffff9e1364bd30d0 R15: 0000000000000000
[  269.647410] FS:  00007f97227af740(0000) GS:ffff9e136fa40000(0000) knlGS:0000000000000000
[  269.656441] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  269.662857] CR2: 000055cfdb69e158 CR3: 00000004677f6001 CR4: 00000000003606e0
[  269.670820] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  269.678790] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  269.686760] Call Trace:
[  269.689489]  device_shutdown+0xc1/0x210
[  269.693773]  kernel_kexec+0x51/0x96
[  269.697666]  __do_sys_reboot+0x1be/0x210
[  269.702045]  ? kmem_cache_free+0x1aa/0x1d0
[  269.706618]  ? __dentry_kill+0x121/0x170
[  269.710998]  ? _cond_resched+0x15/0x30
[  269.715181]  ? dentry_kill+0x4d/0x190
[  269.719260]  ? _cond_resched+0x15/0x30
[  269.723444]  do_syscall_64+0x53/0x110
[  269.727531]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  269.733172] RIP: 0033:0x7f97228a3373
[  269.737161] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 9a 0c 00 f7 d8
[  269.758147] RSP: 002b:00007ffe11d30fa8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9
[  269.766602] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f97228a3373
[  269.774572] RDX: 0000000045584543 RSI: 0000000028121969 RDI: 00000000fee1dead
[  269.782541] RBP: 0000000000000002 R08: 0000000000000004 R09: 000055cfdb69e160
[  269.790511] R10: fffffffffffffb8e R11: 0000000000000202 R12: 00007ffe11d31238
[  269.798482] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ffffffff
[  269.806443] Modules linked in: nft_chain_route_ipv4(E) xt_TCPMSS(E) sx_bfd(OE) sx_netdev(OE) psample(E) dummy(E) sx_core(OE) 8021q(E) garp(E) mrp(E) mst_pciconf(OE) mst_pci(OE) xt_hl(E) xt_tcpudp(E) ip6_tables(E) nft_compat(E) nft_counter(E) xt_conntrack(E) nf_nat(E) nf_conntrack_netlink(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) xfrm_user(E) xfrm_algo(E) intel_rapl(E) mlxsw_minimal(E) sb_edac(E) mlxsw_i2c(E) x86_pkg_temp_thermal(E) mlxsw_core(E) intel_powerclamp(E) devlink(E) kvm_intel(E) bonding(E) kvm(E) i2c_mux_reg(E) i2c_mux(E) mlxreg_hotplug(E) mlxreg_io(E) leds_mlxreg(E) i2c_mlxcpld(E) mlxreg_fan(E) mxm_wmi(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) evdev(E) mlx_platform(E) ghash_clmulni_intel(E) intel_cstate(E) sg(E) intel_uncore(E) iTCO_wdt(E) pcspkr(E)
[  269.885239]  intel_rapl_perf(E) ioatdma(E) iTCO_vendor_support(E) pcc_cpufreq(E) wmi(E) ebt_vlan(E) ebtable_broute(E) bridge(E) stp(E) llc(E) ebtable_nat(E) nf_tables(E) button(E) nfnetlink(E) ebtable_filter(E) ebtables(E) xdpe12284(E) at24(E) ledtrig_timer(E) tmp102(E) lm75(E) coretemp(E) max1363(E) industrialio_triggered_buffer(E) kfifo_buf(E) industrialio(E) tps53679(E) pmbus(E) pmbus_core(E) i2c_dev(E) ip_tables(E) x_tables(E) autofs4(E) loop(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) fscrypto(E) ecb(E) sd_mod(E) nvme(E) nvme_core(E) nls_utf8(E) nls_cp437(E) nls_ascii(E) vfat(E) fat(E) overlay(E) squashfs(E) zstd_decompress(E) xxhash(E) crc32c_intel(E) gpio_ich(E) ahci(E) aesni_intel(E) libahci(E) aes_x86_64(E) crypto_simd(E) xhci_pci(E) ehci_pci(E) libata(E) igb(E) ehci_hcd(E)
[  269.964036]  xhci_hcd(E) cryptd(E) glue_helper(E) scsi_mod(E) i2c_algo_bit(E) i2c_i801(E) lpc_ich(E) dca(E) mfd_core(E) usbcore(E) usb_common(E)
[  269.978536] ---[ end trace 8f56c678b52f9aee ]---
[  269.983698] RIP: 0010:__list_del_entry_valid.cold.1+0x34/0x4c
[  269.990123] Code: 9f 29 a5 e8 68 7a d0 ff 0f 0b 48 c7 c7 20 a0 29 a5 e8 5a 7a d0 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 e0 9f 29 a5 e8 46 7a d0 ff <0f> 0b 48 89 fe 48 c7 c7 a8 9f 29 a5 e8 35 7a d0 ff 0f 0b 90 90 90
[  270.011117] RSP: 0018:ffffaddb83b5fdc0 EFLAGS: 00010246
[  270.016958] RAX: 0000000000000054 RBX: ffff9e136bd57418 RCX: 0000000000000000
[  270.024935] RDX: 0000000000000000 RSI: ffff9e136fa566b8 RDI: ffff9e136fa566b8
[  270.032912] RBP: ffff9e1364bd5070 R08: 00000000000005ce R09: 0000000000000004
[  270.040890] R10: 0000000000000766 R11: ffffffffa59f66ad R12: ffff9e136bd57400
[  270.048866] R13: ffffffffa52c6a12 R14: ffff9e1364bd30d0 R15: 0000000000000000
[  270.056844] FS:  00007f97227af740(0000) GS:ffff9e136fa40000(0000) knlGS:0000000000000000
[  270.065889] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  270.072312] CR2: 000055cfdb69e158 CR3: 00000004677f6001 CR4: 00000000003606e0
[  270.080289] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  270.088268] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
```

A temporary solution is to disable this config and to work with the linux community on fixing it. 
The solution requires fan driver update which is not trivial and will take some time to have it available on next-net before can be backported to SONiC linux-kernel.

It was tested on:
HwSKU: ACS-MSN2410
HwSKU: Mellanox-SN2700
vivekrnv pushed a commit that referenced this pull request Sep 27, 2023
…bors over iBGP Session (sonic-net#16705)

What I did:
Enable Sending BGP Community over internal neighbors over iBGP Session

Microsoft ADO: 25268695

Why I did:
Without this change BGP community send by e-BGP Peers are not carry-forward to other e-BGP peers.


str2-xxxx-lc1-2# show bgp ipv6  20c0:a801::/64
BGP routing table entry for 20c0:a801::/64, version 52141
Paths: (1 available, best #1, table default)
  Not advertised to any peer
  65000 65500
    2603:10e2:400::6 from 2603:10e2:400::6 (3.3.3.6)
      Origin IGP, localpref 100, valid, internal, best (First path received)
      Last update: Tue Sep 26 16:08:26 2023
str2-xxxx-lc1-2# show ip bgp 192.168.35.128/25
BGP routing table entry for 192.168.35.128/25, version 52688
Paths: (1 available, best #1, table default)
  Not advertised to any peer
  65000 65502
    3.3.3.6 from 3.3.3.6 (3.3.3.6)
      Origin IGP, localpref 100, valid, internal, best (First path received)
      Last update: Tue Sep 26 15:45:51 2023

After the change

str2-xxxx-lc2-2(config)# router bgp 65100
str2-xxxx-lc2-2(config-router)# address-family ipv4
str2-xxxx-lc2-2(config-router-af)# neighbor INTERNAL_PEER_V4 send-community
str2-xxxx-lc2-2(config-router-af)# exit
str2-xxxx-lc2-2(config-router)# address-family ipv6
str2-xxxx-lc2-2(config-router-af)# neighbor INTERNAL_PEER_V6 send-community
str2-xxxx-lc1-2# show bgp ipv6  20c0:a801::/64
BGP routing table entry for 20c0:a801::/64, version 52400
Paths: (1 available, best #1, table default)
  Not advertised to any peer
  65000 65500
    2603:10e2:400::6 from 2603:10e2:400::6 (3.3.3.6)
      Origin IGP, localpref 100, valid, internal, best (First path received)
      **Community: 1111:1111**
      Last update: Tue Sep 26 16:10:19 2023
str2-xxxx-lc1-2# show ip bgp 192.168.35.128/25
BGP routing table entry for 192.168.35.128/25, version 52947
Paths: (1 available, best #1, table default)
  Not advertised to any peer
  65000 65502
    3.3.3.6 from 3.3.3.6 (3.3.3.6)
      Origin IGP, localpref 100, valid, internal, best (First path received)
      **Community: 1111:1111**
      Last update: Tue Sep 26 16:10:09 2023

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
vivekrnv pushed a commit that referenced this pull request Oct 27, 2023
…kernel 6.1 and bookworm (sonic-net#16954)

* sonic-platform-modules-cel: broadcom: adapt for kernel 6.1 and bookworm

The i2c_driver->remove API declaration has been updated to return void instead
of int, as part of cleanup patches in 6.1. More details can be referred from
here: [1]. Update the remove API definition in the modules accordingly and
cleanup variables that go unused from the remove API.

Update python build commands for bookworm. The packaging based on calling
setup.py is deprecated and using build module/pip utility is the recommended
method for python packaging/installation. Further details can be referred to
from here: [2], [3]. The build module is picky about the package information file,
which needs to be either setup.py or pyproject.toml.

Additionally, fix formatting inconsistencies in debian/changelog reported by
`dh_installchangelogs` during the build.

Tested the changes by compiling the changes as below:

    make sonic-slave-bash NOBUSTER=1 NOBULLSEYE=1
    sudo dpkg -i target/debs/bookworm/linux-headers-6.1.0-11-2-*.deb
    cd platform/broadcom/sonic-platform-modules-cel
    KVERSION=6.1.0-11-2-amd64 dpkg-buildpackage

Also verified the python scripts under the sonic-platform-modules-cel with
pyflakes to ensure no new errors are flagged (with exception of unused modules).

References:
   [1] - torvalds/linux@ed5c2f5f
   [2] - https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.htm
   [3] - 0b20a48 (Update Python build commands for Bookworm, 2023-09-07)

Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>

* platform/pddf: i2c: adapt for kernel 6.1 and bookworm

   * Fixup i2c_driver->remove API due to changes in the function
     prototype (ref: [1]).

   * Cleanup `MODULE_SUPPORTED_DEVICE` macros that were cleaned up in
     the upstream (ref: [2]).

   * Sanitize python packaging and installation using the `build` module
   instead of calling the setup.py directly (ref: [3]. [4]).

Tested the changes by compiling pddf module as below:

     make sonic-slave-bash NOBUSTER=1 NOBULLSEYE=1
     sudo dpkg -i target/debs/bookworm/linux-headers-6.1.0-11-2-*.deb
     cd platform/pddf/i2c
     KVERSION=6.1.0-11-2-amd64 dpkg-buildpackage

References:
    [1] - torvalds/linux@ed5c2f5f
    [2] - torvalds/linux@6417f031
    [2] - https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.htm
    [3] - 0b20a48 (Update Python build commands for Bookworm, 2023-09-07)

Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>

* platform/broadcom: include platform-modules-cel in builds

With pddf modules patched for 6.1, platform-modules-cel can be compiled
and included in the final image.

Testing by building sonic-broadcom.bin/sonic-broadcom-dnx.bin.

Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>

* pddf/i2c: revert correct rootdir for pip install

The pip install directory has been set to test-pkg1/ for testing the build and
incorrectly retained as is. Revert this to the correct path $(PACKAGE_PRE_NAME).

Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>

* platform/broadcom: include pddf/modules-cel in the base package

Without this change, the modules were built but not packaged in the final .bin.

The final sonic-broadcom.bin has been tested for bootup on Celestica's
Silverstone platform.

   admin@sonic:~$ uname -a
   Linux sonic 6.1.0-11-2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-4 (2023-08-08) x86_64 GNU/Linux
   admin@sonic:~$ show platform summary
   Platform: x86_64-cel_silverstone-r0
   HwSKU: Silverstone
   ASIC: broadcom
   ASIC Count: 1
   Serial Number: R4009B2F062504LK200024
   Model Number: N/A
   Hardware Revision: N/A
   admin@sonic:~$ show version | head

   SONiC Software Version: SONiC.g0aad6c67c-rachandr
   SONiC OS Version: 12
   Distribution: Debian 12.2
   Kernel: 6.1.0-11-2-amd64
   Build commit: 0aad6c67c
   Build date: Thu Oct 26 07:13:47 UTC 2023
   Built by: rachandr@AZUHPS14

   Platform: x86_64-cel_silverstone-r0

Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>

---------

Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>
vivekrnv pushed a commit that referenced this pull request Nov 7, 2023
…kernel 6.1 and bookworm (sonic-net#16954)

* sonic-platform-modules-cel: broadcom: adapt for kernel 6.1 and bookworm

The i2c_driver->remove API declaration has been updated to return void instead
of int, as part of cleanup patches in 6.1. More details can be referred from
here: [1]. Update the remove API definition in the modules accordingly and
cleanup variables that go unused from the remove API.

Update python build commands for bookworm. The packaging based on calling
setup.py is deprecated and using build module/pip utility is the recommended
method for python packaging/installation. Further details can be referred to
from here: [2], [3]. The build module is picky about the package information file,
which needs to be either setup.py or pyproject.toml.

Additionally, fix formatting inconsistencies in debian/changelog reported by
`dh_installchangelogs` during the build.

Tested the changes by compiling the changes as below:

    make sonic-slave-bash NOBUSTER=1 NOBULLSEYE=1
    sudo dpkg -i target/debs/bookworm/linux-headers-6.1.0-11-2-*.deb
    cd platform/broadcom/sonic-platform-modules-cel
    KVERSION=6.1.0-11-2-amd64 dpkg-buildpackage

Also verified the python scripts under the sonic-platform-modules-cel with
pyflakes to ensure no new errors are flagged (with exception of unused modules).

References:
   [1] - torvalds/linux@ed5c2f5f
   [2] - https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.htm
   [3] - 0b20a48 (Update Python build commands for Bookworm, 2023-09-07)

Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>

* platform/pddf: i2c: adapt for kernel 6.1 and bookworm

   * Fixup i2c_driver->remove API due to changes in the function
     prototype (ref: [1]).

   * Cleanup `MODULE_SUPPORTED_DEVICE` macros that were cleaned up in
     the upstream (ref: [2]).

   * Sanitize python packaging and installation using the `build` module
   instead of calling the setup.py directly (ref: [3]. [4]).

Tested the changes by compiling pddf module as below:

     make sonic-slave-bash NOBUSTER=1 NOBULLSEYE=1
     sudo dpkg -i target/debs/bookworm/linux-headers-6.1.0-11-2-*.deb
     cd platform/pddf/i2c
     KVERSION=6.1.0-11-2-amd64 dpkg-buildpackage

References:
    [1] - torvalds/linux@ed5c2f5f
    [2] - torvalds/linux@6417f031
    [2] - https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.htm
    [3] - 0b20a48 (Update Python build commands for Bookworm, 2023-09-07)

Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>

* platform/broadcom: include platform-modules-cel in builds

With pddf modules patched for 6.1, platform-modules-cel can be compiled
and included in the final image.

Testing by building sonic-broadcom.bin/sonic-broadcom-dnx.bin.

Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>

* pddf/i2c: revert correct rootdir for pip install

The pip install directory has been set to test-pkg1/ for testing the build and
incorrectly retained as is. Revert this to the correct path $(PACKAGE_PRE_NAME).

Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>

* platform/broadcom: include pddf/modules-cel in the base package

Without this change, the modules were built but not packaged in the final .bin.

The final sonic-broadcom.bin has been tested for bootup on Celestica's
Silverstone platform.

   admin@sonic:~$ uname -a
   Linux sonic 6.1.0-11-2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-4 (2023-08-08) x86_64 GNU/Linux
   admin@sonic:~$ show platform summary
   Platform: x86_64-cel_silverstone-r0
   HwSKU: Silverstone
   ASIC: broadcom
   ASIC Count: 1
   Serial Number: R4009B2F062504LK200024
   Model Number: N/A
   Hardware Revision: N/A
   admin@sonic:~$ show version | head

   SONiC Software Version: SONiC.g0aad6c67c-rachandr
   SONiC OS Version: 12
   Distribution: Debian 12.2
   Kernel: 6.1.0-11-2-amd64
   Build commit: 0aad6c67c
   Build date: Thu Oct 26 07:13:47 UTC 2023
   Built by: rachandr@AZUHPS14

   Platform: x86_64-cel_silverstone-r0

Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>

---------

Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>
vivekrnv pushed a commit that referenced this pull request Nov 22, 2023
…kernel 6.1 and bookworm (sonic-net#16954)

* sonic-platform-modules-cel: broadcom: adapt for kernel 6.1 and bookworm

The i2c_driver->remove API declaration has been updated to return void instead
of int, as part of cleanup patches in 6.1. More details can be referred from
here: [1]. Update the remove API definition in the modules accordingly and
cleanup variables that go unused from the remove API.

Update python build commands for bookworm. The packaging based on calling
setup.py is deprecated and using build module/pip utility is the recommended
method for python packaging/installation. Further details can be referred to
from here: [2], [3]. The build module is picky about the package information file,
which needs to be either setup.py or pyproject.toml.

Additionally, fix formatting inconsistencies in debian/changelog reported by
`dh_installchangelogs` during the build.

Tested the changes by compiling the changes as below:

    make sonic-slave-bash NOBUSTER=1 NOBULLSEYE=1
    sudo dpkg -i target/debs/bookworm/linux-headers-6.1.0-11-2-*.deb
    cd platform/broadcom/sonic-platform-modules-cel
    KVERSION=6.1.0-11-2-amd64 dpkg-buildpackage

Also verified the python scripts under the sonic-platform-modules-cel with
pyflakes to ensure no new errors are flagged (with exception of unused modules).

References:
   [1] - torvalds/linux@ed5c2f5f
   [2] - https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.htm
   [3] - 0b20a48 (Update Python build commands for Bookworm, 2023-09-07)

Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>

* platform/pddf: i2c: adapt for kernel 6.1 and bookworm

   * Fixup i2c_driver->remove API due to changes in the function
     prototype (ref: [1]).

   * Cleanup `MODULE_SUPPORTED_DEVICE` macros that were cleaned up in
     the upstream (ref: [2]).

   * Sanitize python packaging and installation using the `build` module
   instead of calling the setup.py directly (ref: [3]. [4]).

Tested the changes by compiling pddf module as below:

     make sonic-slave-bash NOBUSTER=1 NOBULLSEYE=1
     sudo dpkg -i target/debs/bookworm/linux-headers-6.1.0-11-2-*.deb
     cd platform/pddf/i2c
     KVERSION=6.1.0-11-2-amd64 dpkg-buildpackage

References:
    [1] - torvalds/linux@ed5c2f5f
    [2] - torvalds/linux@6417f031
    [2] - https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.htm
    [3] - 0b20a48 (Update Python build commands for Bookworm, 2023-09-07)

Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>

* platform/broadcom: include platform-modules-cel in builds

With pddf modules patched for 6.1, platform-modules-cel can be compiled
and included in the final image.

Testing by building sonic-broadcom.bin/sonic-broadcom-dnx.bin.

Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>

* pddf/i2c: revert correct rootdir for pip install

The pip install directory has been set to test-pkg1/ for testing the build and
incorrectly retained as is. Revert this to the correct path $(PACKAGE_PRE_NAME).

Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>

* platform/broadcom: include pddf/modules-cel in the base package

Without this change, the modules were built but not packaged in the final .bin.

The final sonic-broadcom.bin has been tested for bootup on Celestica's
Silverstone platform.

   admin@sonic:~$ uname -a
   Linux sonic 6.1.0-11-2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-4 (2023-08-08) x86_64 GNU/Linux
   admin@sonic:~$ show platform summary
   Platform: x86_64-cel_silverstone-r0
   HwSKU: Silverstone
   ASIC: broadcom
   ASIC Count: 1
   Serial Number: R4009B2F062504LK200024
   Model Number: N/A
   Hardware Revision: N/A
   admin@sonic:~$ show version | head

   SONiC Software Version: SONiC.g0aad6c67c-rachandr
   SONiC OS Version: 12
   Distribution: Debian 12.2
   Kernel: 6.1.0-11-2-amd64
   Build commit: 0aad6c67c
   Build date: Thu Oct 26 07:13:47 UTC 2023
   Built by: rachandr@AZUHPS14

   Platform: x86_64-cel_silverstone-r0

Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>

---------

Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>
vivekrnv pushed a commit that referenced this pull request Mar 18, 2024
[build] Use public storage for public resources. (sonic-net#18038)
vivekrnv pushed a commit that referenced this pull request Aug 30, 2024
#### Why I did it

Dropping control character (message sent when XSUB connects to XPUB as part of ZMQ Proxy setup to notify that subscription has been made) in do capture has been flaky since control character is not guaranteed to be the first message sent if there are events (like event-down-ctr) being published to XSUB.

Scenarios

1) Control character is sent and is first message when starting capture service

`eventd#eventd#eventd: :- heartbeat_ctrl: Set heartbeat_ctrl pause=1`
`eventd#eventd#eventd: :- do_capture: Received subscription message when XSUB connects to XPUB`

2) Events like event-down ctr is sent before control character

`eventd#eventd#eventd: :- run: Dropping Message: 22 serialization::archive 18 17 sonic-events-host`
`eventd#eventd#eventd: :- run: Dropping Message: 22 serialization::archive 18 0 0 4 0 0 0 1 d 103 {"sonic-events-host:event-stopped-ctr":{"ctr_name":"EVENTD","timestamp":"2024-08-27T00:02:51.407518Z"}} 1 r 36 3357542f-bae1-458f-a804-660e620d21f5 1 s 1 9 1 t 19 1724716971407591080`
`heartbeat_ctrl: Set heartbeat_ctrl pause=1`
`do_capture: Received subscription message when XSUB connects to XPUB`

3) Control character is not sent at all

`eventd#eventd#eventd: :- heartbeat_ctrl: Set heartbeat_ctrl pause=1`

4) Control character is delayed and not caught when starting capture service, but is then caught after causing deserialize error.

`do_capture: Receiving event from source: 22 serialization::archive 18 17 sonic-events-host, will read second part of event`
`deserialize: deserialize Failed: input stream errorstr[0:64]:(#1) data type: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&`
`zmq_read_part: Failed to deserialize part rc=-2`
`zmq_read_part: last:errno=11`
`zmq_message_read: Failure to read part1 rc=-2`
`zmq_message_read: last:errno=11`

We can cover these scenarios by just dropping the control character inside zmq_message_read as part of events_common in swsscommon (different PR). In this PR we will remove such handling logic and make sure that empty events that will be sent by control character are ignored.

##### Work item tracking
- Microsoft ADO **(number only)**:28728116

#### How I did it

Remove logic for handling control character

#### How to verify it

UT and sonic-mgmt test cases.
vivekrnv pushed a commit that referenced this pull request Sep 20, 2024
#### Why I did it

Dropping control character (message sent when XSUB connects to XPUB as part of ZMQ Proxy setup to notify that subscription has been made) in do capture has been flaky since control character is not guaranteed to be the first message sent if there are events (like event-down-ctr) being published to XSUB.

Scenarios

1) Control character is sent and is first message when starting capture service

`eventd#eventd#eventd: :- heartbeat_ctrl: Set heartbeat_ctrl pause=1`
`eventd#eventd#eventd: :- do_capture: Received subscription message when XSUB connects to XPUB`

2) Events like event-down ctr is sent before control character

`eventd#eventd#eventd: :- run: Dropping Message: 22 serialization::archive 18 17 sonic-events-host`
`eventd#eventd#eventd: :- run: Dropping Message: 22 serialization::archive 18 0 0 4 0 0 0 1 d 103 {"sonic-events-host:event-stopped-ctr":{"ctr_name":"EVENTD","timestamp":"2024-08-27T00:02:51.407518Z"}} 1 r 36 3357542f-bae1-458f-a804-660e620d21f5 1 s 1 9 1 t 19 1724716971407591080`
`heartbeat_ctrl: Set heartbeat_ctrl pause=1`
`do_capture: Received subscription message when XSUB connects to XPUB`

3) Control character is not sent at all

`eventd#eventd#eventd: :- heartbeat_ctrl: Set heartbeat_ctrl pause=1`

4) Control character is delayed and not caught when starting capture service, but is then caught after causing deserialize error.

`do_capture: Receiving event from source: 22 serialization::archive 18 17 sonic-events-host, will read second part of event`
`deserialize: deserialize Failed: input stream errorstr[0:64]:(#1) data type: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&`
`zmq_read_part: Failed to deserialize part rc=-2`
`zmq_read_part: last:errno=11`
`zmq_message_read: Failure to read part1 rc=-2`
`zmq_message_read: last:errno=11`

We can cover these scenarios by just dropping the control character inside zmq_message_read as part of events_common in swsscommon (different PR). In this PR we will remove such handling logic and make sure that empty events that will be sent by control character are ignored.

##### Work item tracking
- Microsoft ADO **(number only)**:28728116

#### How I did it

Remove logic for handling control character

#### How to verify it

UT and sonic-mgmt test cases.
vivekrnv pushed a commit that referenced this pull request Dec 6, 2024
#### Why I did it

Dropping control character (message sent when XSUB connects to XPUB as part of ZMQ Proxy setup to notify that subscription has been made) in do capture has been flaky since control character is not guaranteed to be the first message sent if there are events (like event-down-ctr) being published to XSUB.

Scenarios

1) Control character is sent and is first message when starting capture service

`eventd#eventd#eventd: :- heartbeat_ctrl: Set heartbeat_ctrl pause=1`
`eventd#eventd#eventd: :- do_capture: Received subscription message when XSUB connects to XPUB`

2) Events like event-down ctr is sent before control character

`eventd#eventd#eventd: :- run: Dropping Message: 22 serialization::archive 18 17 sonic-events-host`
`eventd#eventd#eventd: :- run: Dropping Message: 22 serialization::archive 18 0 0 4 0 0 0 1 d 103 {"sonic-events-host:event-stopped-ctr":{"ctr_name":"EVENTD","timestamp":"2024-08-27T00:02:51.407518Z"}} 1 r 36 3357542f-bae1-458f-a804-660e620d21f5 1 s 1 9 1 t 19 1724716971407591080`
`heartbeat_ctrl: Set heartbeat_ctrl pause=1`
`do_capture: Received subscription message when XSUB connects to XPUB`

3) Control character is not sent at all

`eventd#eventd#eventd: :- heartbeat_ctrl: Set heartbeat_ctrl pause=1`

4) Control character is delayed and not caught when starting capture service, but is then caught after causing deserialize error.

`do_capture: Receiving event from source: 22 serialization::archive 18 17 sonic-events-host, will read second part of event`
`deserialize: deserialize Failed: input stream errorstr[0:64]:(#1) data type: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&`
`zmq_read_part: Failed to deserialize part rc=-2`
`zmq_read_part: last:errno=11`
`zmq_message_read: Failure to read part1 rc=-2`
`zmq_message_read: last:errno=11`

We can cover these scenarios by just dropping the control character inside zmq_message_read as part of events_common in swsscommon (different PR). In this PR we will remove such handling logic and make sure that empty events that will be sent by control character are ignored.

##### Work item tracking
- Microsoft ADO **(number only)**:28728116

#### How I did it

Remove logic for handling control character

#### How to verify it

UT and sonic-mgmt test cases.
vivekrnv pushed a commit that referenced this pull request Dec 6, 2024
To fix a statistical issue. The original fix was done in FRRouting/frr#17297. However to accommodate 8.5.4 the patch in the PR was added.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl -M snmp'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7fccd6faf7c0 (LWP 36))]
(gdb) bt
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fccd7302fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fccd72ed472 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007fccd75bb3a9 in _zlog_assert_failed (xref=xref@entry=0x7fccd7652380 <_xref.16>, extra=extra@entry=0x0) at ../lib/zlog.c:678
#4  0x00007fccd759b2fe in route_node_delete (node=<optimized out>) at ../lib/table.c:352
#5  0x00007fccd759b445 in route_unlock_node (node=0x0) at ../lib/table.h:258
#6  route_next (node=<optimized out>) at ../lib/table.c:436
#7  route_next (node=node@entry=0x56029d89e560) at ../lib/table.c:410
#8  0x000056029b6b6b7a in if_lookup_by_name_per_ns (ns=ns@entry=0x56029d873d90, ifname=ifname@entry=0x7fccc0029340 "PortChannel1020")
    at ../zebra/interface.c:312
#9  0x000056029b6b8b36 in zebra_if_dplane_ifp_handling (ctx=0x7fccc0029310) at ../zebra/interface.c:1867
#10 zebra_if_dplane_result (ctx=0x7fccc0029310) at ../zebra/interface.c:2221
#11 0x000056029b7137a9 in rib_process_dplane_results (thread=<optimized out>) at ../zebra/zebra_rib.c:4810
#12 0x00007fccd75a0e0d in thread_call (thread=thread@entry=0x7ffe8e553cc0) at ../lib/thread.c:1990
#13 0x00007fccd7559368 in frr_run (master=0x56029d65a040) at ../lib/libfrr.c:1198
#14 0x000056029b6ac317 in main (argc=9, argv=0x7ffe8e5540d8) at ../zebra/main.c:478
vivekrnv pushed a commit that referenced this pull request Dec 18, 2024
…et#21095)

Adding the below fix from FRR FRRouting/frr#17297

This is to fix the following crash which is a statistical issue

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl -M snmp'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7fccd6faf7c0 (LWP 36))]
(gdb) bt
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fccd7302fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fccd72ed472 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007fccd75bb3a9 in _zlog_assert_failed (xref=xref@entry=0x7fccd7652380 <_xref.16>, extra=extra@entry=0x0) at ../lib/zlog.c:678
#4  0x00007fccd759b2fe in route_node_delete (node=<optimized out>) at ../lib/table.c:352
#5  0x00007fccd759b445 in route_unlock_node (node=0x0) at ../lib/table.h:258
#6  route_next (node=<optimized out>) at ../lib/table.c:436
#7  route_next (node=node@entry=0x56029d89e560) at ../lib/table.c:410
#8  0x000056029b6b6b7a in if_lookup_by_name_per_ns (ns=ns@entry=0x56029d873d90, ifname=ifname@entry=0x7fccc0029340 "PortChannel1020")
    at ../zebra/interface.c:312
#9  0x000056029b6b8b36 in zebra_if_dplane_ifp_handling (ctx=0x7fccc0029310) at ../zebra/interface.c:1867
#10 zebra_if_dplane_result (ctx=0x7fccc0029310) at ../zebra/interface.c:2221
#11 0x000056029b7137a9 in rib_process_dplane_results (thread=<optimized out>) at ../zebra/zebra_rib.c:4810
#12 0x00007fccd75a0e0d in thread_call (thread=thread@entry=0x7ffe8e553cc0) at ../lib/thread.c:1990
#13 0x00007fccd7559368 in frr_run (master=0x56029d65a040) at ../lib/libfrr.c:1198
#14 0x000056029b6ac317 in main (argc=9, argv=0x7ffe8e5540d8) at ../zebra/main.c:478
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants