Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idpf-linux: block changing ring params while af_xdp is active #25

Open
wants to merge 49 commits into
base: idpf-libie-new
Choose a base branch
from

Commits on Jul 16, 2024

  1. netdevice: convert private flags > BIT(31) to bitfields

    Make dev->priv_flags `u32` back and define bits higher than 31 as
    bitfield booleans as per Jakub's suggestion. This simplifies code
    which accesses these bits with no optimization loss (testb both
    before/after), allows to not extend &netdev_priv_flags each time,
    but also scales better as bits > 63 in the future would only add
    a new u64 to the structure with no complications, comparing to
    that extending ::priv_flags would require converting it to a bitmap.
    Note that I picked `unsigned long :1` to not lose any potential
    optimizations comparing to `bool :1` etc.
    
    Suggested-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    d06d98d View commit details
    Browse the repository at this point in the history
  2. netdev_features: remove unused __UNUSED_NETIF_F_1

    NETIF_F_NO_CSUM was removed in 3.2-rc2 by commit 34324dc
    ("net: remove NETIF_F_NO_CSUM feature bit") and became
    __UNUSED_NETIF_F_1. It's not used anywhere in the code.
    Remove this bit waste.
    
    It wasn't needed to rename the flag instead of removing it as
    netdev features are not uAPI/ABI. Ethtool passes their names
    and values separately with no fixed positions and the userspace
    Ethtool code doesn't have any hardcoded feature names/bits, so
    that new Ethtool will work on older kernels and vice versa.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    1c76f46 View commit details
    Browse the repository at this point in the history
  3. netdev_features: convert NETIF_F_LLTX to dev->lltx

    NETIF_F_LLTX can't be changed via Ethtool and is not a feature,
    rather an attribute, very similar to IFF_NO_QUEUE (and hot).
    Free one netdev_features_t bit and make it a "hot" private flag.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    ad1f18e View commit details
    Browse the repository at this point in the history
  4. netdev_features: convert NETIF_F_NETNS_LOCAL to dev->netns_local

    "Interface can't change network namespaces" is rather an attribute,
    not a feature, and it can't be changed via Ethtool.
    Make it a "cold" private flag instead of a netdev_feature and free
    one more bit.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    1e66f4b View commit details
    Browse the repository at this point in the history
  5. netdev_features: convert NETIF_F_FCOE_MTU to dev->fcoe_mtu

    Ability to handle maximum FCoE frames of 2158 bytes can never be changed
    and thus more of an attribute, not a toggleable feature.
    Move it from netdev_features_t to "cold" priv flags (bitfield bool) and
    free yet another feature bit.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    66bab81 View commit details
    Browse the repository at this point in the history
  6. net: netdev_features: remove NETIF_F_ALL_FCOE

    NETIF_F_ALL_FCOE is used only in vlan_dev.c, 2 times. Now that it's only
    2 bits, open-code it and remove the definition from netdev_features.h.
    
    Suggested-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    44c8d76 View commit details
    Browse the repository at this point in the history

Commits on Jul 17, 2024

  1. idpf: fix memory leaks and crashes while performing a soft reset

    The second tagged commit introduced a UAF, as it removed restoring
    q_vector->vport pointers after reinitializating the structures.
    This is due to that all queue allocation functions are performed here
    with the new temporary vport structure and those functions rewrite
    the backpointers to the vport. Then, this new struct is freed and
    the pointers start leading to nowhere.
    
    But generally speaking, the current logic is very fragile. It claims
    to be more reliable when the system is low on memory, but in fact, it
    consumes two times more memory as at the moment of running this
    function, there are two vports allocated with their queues and vectors.
    Moreover, it claims to prevent the driver from running into "bad state",
    but in fact, any error during the rebuild leaves the old vport in the
    partially allocated state.
    Finally, if the interface is down when the function is called, it always
    allocates a new queue set, but when the user decides to enable the
    interface later on, vport_open() allocates them once again, IOW there's
    a clear memory leak here.
    
    There's now oneliner way to fix this all. Instead, rewrite the function
    from scratch without playing with two vports and memcpy()s. Just perform
    everything on the current structure and do a minimum set of stuff needed
    to rebuild the vport. Don't allocate the queues at all, as vport_open(),
    no matter if it will be called here or during the next ifup, will do
    that for us.
    
    Fixes: 02cbfba ("idpf: add ethtool callbacks")
    Fixes: e4891e4 ("idpf: split &idpf_queue into 4 strictly-typed queue structures")
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    1b85df9 View commit details
    Browse the repository at this point in the history
  2. idpf: fix memleak in vport interrupt configuration

    The initialization of vport interrupt consists of two functions:
     1) idpf_vport_intr_init() where a generic configuration is done
     2) idpf_vport_intr_req_irq() where the irq for each q_vector is
       requested.
    
    The first function used to create a base name for each interrupt using
    "kasprintf()" call. Unfortunately, although that call allocated memory
    for a text buffer, that memory was never released.
    
    Fix this by removing creating the interrupt base name in 1).
    Instead, always create a full interrupt name in the function 2), because
    there is no need to create a base name separately, considering that the
    function 2) is never called out of idpf_vport_intr_init() context.
    
    Fixes: d4d5587 ("idpf: initialize interrupts and enable vport")
    Cc: stable@vger.kernel.org # 6.7
    Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    michalQb authored and alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    e2cb9d7 View commit details
    Browse the repository at this point in the history
  3. idpf: fix UAFs when destroying the queues

    The second tagged commit started sometimes (very rarely, but possible)
    throwing WARNs from
    net/core/page_pool.c:page_pool_disable_direct_recycling().
    Turned out idpf frees interrupt vectors with embedded NAPIs *before*
    freeing the queues making page_pools' NAPI pointers lead to freed
    memory before these pools are destroyed by libeth.
    It's not clear whether there are other accesses to the freed vectors
    when destroying the queues, but anyway, we usually free queue/interrupt
    vectors only when the queues are destroyed and the NAPIs are guaranteed
    to not be referenced anywhere.
    
    Invert the allocation and freeing logic making queue/interrupt vectors
    be allocated first and freed last. Vectors don't require queues to be
    present, so this is safe. Additionally, this change allows to remove
    that useless queue->q_vector pointer cleanup, as vectors are still
    valid when freeing the queues (+ both are freed within one function,
    so it's not clear why nullify the pointers at all).
    
    Fixes: 1c325aa ("idpf: configure resources for TX queues")
    Fixes: 90912f9 ("idpf: convert header split mode to libeth + napi_build_skb()")
    Reported-by: Michal Kubiak <michal.kubiak@intel.com>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    5f5e2bd View commit details
    Browse the repository at this point in the history
  4. unroll: add generic loop unroll helpers

    There are cases when we need to explicitly unroll loops. For example,
    cache operations, filling DMA descriptors on very high speeds etc.
    Make MIPS' unroll header a generic one to have "unroll always" macro,
    which would work on any compiler and system, and add compiler-specific
    attribute macros.
    Example usage:
    
     #define UNROLL_BATCH 8
    
    	unrolled_count(UNROLL_BATCH)
    	for (u32 i = 0; i < UNROLL_BATCH; i++)
    		op(var, i);
    
    Not that sometimes the compilers won't unroll loops if they think that
    would have worse optimization and perf than with a loop, and that
    unroll attributes are available only starting GCC 8. In this case,
    you can still use unrolled_call(UNROLL_BATCH, op), which works in
    the range of [1...32] iterations.
    For better unrolling/parallelization, don't have any variables that
    interfere between iterations except for the iterator itself.
    
    Co-developed-by: Jose E. Marchesi <jose.marchesi@oracle.com> # pragmas
    Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
    Co-developed-by: Paul Burton <paulburton@kernel.org> # unrolled_call()
    Signed-off-by: Paul Burton <paulburton@kernel.org>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    f0145fb View commit details
    Browse the repository at this point in the history
  5. libeth: add common queue stats

    Define common structures, inline helpers and Ethtool helpers to collect,
    update and export the statistics (RQ, SQ, XDPSQ). Use u64_stats_t right
    from the start, as well as the corresponding helpers to ensure
    tear-free operations.
    For the NAPI parts of both Rx and Tx, also define small onstack
    containers to update them in polling loops and then sync the actual
    containers once a loop ends.
    In order to implement fully generic Netlink per-queue stats callbacks,
    &libeth_netdev_priv is introduced and is required to be embedded at the
    start of the driver's netdev_priv structure.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    86249af View commit details
    Browse the repository at this point in the history
  6. libie: add Tx buffer completion helpers

    Software-side Tx buffers for storing DMA, frame size, skb pointers etc.
    are pretty much generic and every driver defines them the same way. The
    same can be said for software Tx completions -- same napi_consume_skb()s
    and all that...
    Add a couple simple wrappers for doing that to stop repeating the old
    tale at least within the Intel code. Drivers are free to use 'priv'
    member at the end of the structure.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    7c49fd3 View commit details
    Browse the repository at this point in the history
  7. idpf: convert to libie Tx buffer completion

    &idpf_tx_buffer is almost identical to the previous generations, as well
    as the way it's handled. Moreover, relying on dma_unmap_addr() and
    !!buf->skb instead of explicit defining of buffer's type was never good.
    Use the newly added libie helpers to do it properly and reduce the
    copy-paste around the Tx code.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    c385b1f View commit details
    Browse the repository at this point in the history
  8. netdevice: add netdev_tx_reset_subqueue() shorthand

    Add a shorthand similar to other net*_subqueue() helpers for resetting
    the queue by its index w/o obtaining &netdev_tx_queue beforehand
    manually.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    9194f24 View commit details
    Browse the repository at this point in the history
  9. idpf: refactor Tx completion routines

    This patch adds a mechanism to guard against stashing partial packets
    into the hash table. This makes the driver more robust, leads to more
    efficient decision making when cleaning.
    
    Doon't stash partial packets. This can happen when an RE completion is
    received in flow scheduling mode, or when an out of order RS completion
    is received. The first buffer with the skb is stashed, but some or all
    of its frags are not because the stack is out of reserve buffers. This
    leaves the ring in a weird state since the frags are still on the ring.
    
    Use the field to track the number of fragments/ tx_bufs representing the
    packet. The clean routines check to make sure there are enough reserve
    buffers on the stack before stashing any part of the packet. If there
    are not, next_to_clean is left pointing to the first buffer of the
    packet that failed to be stashed. This leaves the whole packet on the
    ring, and the next time around, cleaning will start from this packet.
    
    An RS completion is still expected for this packet in either case. So
    instead of being cleaned from the hash table, it will be cleaned from
    the ring directly.  This should all still be fine since the DESC_UNUSED
    and BUFS_UNUSED will reflect the state of the ring. If we ever fall
    below the thresholds, the TXQ will still be stopped, giving the
    completion queue time to catch up.  This may lead to stopping the queue
    more frequently, but it guarantees the TX ring will always be in a good
    state.
    
    Also, always use the idpf_tx_splitq_clean function to clean descriptors,
    i.e. use it from clean_buf_ring as well. This way we avoid duplicating
    the logic and make sure we're using the same reserve buffers guard rail.
    
    This does require a switch from the s16 next_to_clean overflow
    descriptor ring wrap calculation to u16 and the normal ring size check.
    
    Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    jahay1 authored and alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    d0bd302 View commit details
    Browse the repository at this point in the history
  10. idpf: fix netdev Tx queue stop/wake

    netif_txq_maybe_stop() returns -1, 0, or 1, while
    idpf_tx_maybe_stop_common() says it returns 0 or -EBUSY. As a result,
    there sometimes are Tx queue timeout warnings despite that the queue
    is empty or there is at least enough space to restart it.
    Make idpf_tx_maybe_stop_common() inline and returning true or false,
    handling the return of netif_txq_maybe_stop() properly. Use a correct
    goto in idpf_tx_maybe_stop_splitq() to avoid stopping the queue or
    incrementing the stops counter twice.
    
    Fixes: 6818c4d ("idpf: add splitq start_xmit")
    Fixes: a5ab9ee ("idpf: add singleq start_xmit and napi poll")
    Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    michalQb authored and alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    cc9628b View commit details
    Browse the repository at this point in the history
  11. idpf: enable WB_ON_ITR

    Tell hardware to write back completed descriptors even when interrupts
    are disabled. Otherwise, descriptors might not be written back until
    the hardware can flush a full cacheline of descriptors. This can cause
    unnecessary delays when traffic is light (or even trigger Tx queue
    timeout).
    
    The example scenario to reproduce the Tx timeout if the fix is not
    applied:
      - configure at least 2 Tx queues to be assigned to the same q_vector,
      - generate a huge Tx traffic on the first Tx queue
      - try to send a few packets using the second Tx queue.
    In such a case Tx timeout will appear on the second Tx queue because no
    completion descriptors are written back for that queue while interrupts
    are disabled due to NAPI polling.
    
    The patch is necessary to start work on the AF_XDP implementation for
    the idpf driver, because there may be a case where a regular LAN Tx
    queue and an XDP queue share the same NAPI.
    
    Fixes: c2d548c ("idpf: add TX splitq napi poll support")
    Fixes: a5ab9ee ("idpf: add singleq start_xmit and napi poll")
    Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
    Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
    Co-developed-by: Michal Kubiak <michal.kubiak@intel.com>
    Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
    jahay1 authored and alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    6cae557 View commit details
    Browse the repository at this point in the history
  12. idpf: switch do libeth generic statistics

    Fully reimplement idpf's per-queue stats using the libeth infra.
    Embed &libeth_netdev_priv to the beginning of &idpf_netdev_priv(),
    call the necessary init/deinit helpers and the corresponding Ethtool
    helpers.
    Update hotpath counters such as hsplit and tso/gso using the onstack
    containers instead of direct accesses to queue->stats.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    56c1d9b View commit details
    Browse the repository at this point in the history
  13. bpf, xdp: constify some bpf_prog * function arguments

    In lots of places, bpf_prog pointer is used only for tracing or other
    stuff that doesn't modify the structure itself. Same for net_device.
    Address at least some of them and add `const` attributes there. The
    object code didn't change, but that may prevent unwanted data
    modifications and also allow more helpers to have const arguments.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    1db8f94 View commit details
    Browse the repository at this point in the history
  14. xdp, xsk: constify read-only arguments of some static inline helpers

    Lots of read-only helpers for &xdp_buff and &xdp_frame, such as getting
    the frame length, skb_shared_info etc., don't have their arguments
    marked with `const` for no reason. Add the missing annotations to leave
    less place for mistakes and more for optimization.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    afcb93d View commit details
    Browse the repository at this point in the history
  15. xdp: allow attaching already registered memory model to xdp_rxq_info

    One may need to register memory model separately from xdp_rxq_info. One
    simple example may be XDP test run code, but in general, it might be
    useful when memory model registering is managed by one layer and then
    XDP RxQ info by a different one.
    Allow such scenarios by adding a simple helper which "attaches" an
    already registered memory model to the desired xdp_rxq_info. As this
    is mostly needed for Page Pool, add a special function to do that for
    a &page_pool pointer.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    da6f0f1 View commit details
    Browse the repository at this point in the history
  16. net: Register system page pool as an XDP memory model

    To make the system page pool usable as a source for allocating XDP
    frames, we need to register it with xdp_reg_mem_model(), so that page
    return works correctly. This is done in preparation for using the system
    page pool for the XDP live frame mode in BPF_TEST_RUN; for the same
    reason, make the per-cpu variable non-static so we can access it from
    the test_run code as well.
    
    Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
    tohojo authored and alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    1859320 View commit details
    Browse the repository at this point in the history
  17. page_pool: make page_pool_put_page_bulk() actually handle array of pages

    Currently, page_pool_put_page_bulk() indeed takes an array of pointers
    to the data, not pages, despite the name. As one side effect, when
    you're freeing frags from &skb_shared_info, xdp_return_frame_bulk()
    converts page pointers to virtual addresses and then
    page_pool_put_page_bulk() converts them back.
    Make page_pool_put_page_bulk() actually handle array of pages. Pass
    frags directly and use virt_to_page() when freeing xdpf->data, so that
    the PP core will then get the compound head and take care of the rest.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    2cd561a View commit details
    Browse the repository at this point in the history
  18. page_pool: allow mixing PPs within one bulk

    The main reason for this change was to allow mixing pages from different
    &page_pools within one &xdp_buff/&xdp_frame. Why not?
    Adjust xdp_return_frame_bulk() and page_pool_put_page_bulk(), so that
    they won't be tied to a particular pool. Let the latter splice the
    bulk when it encounters a page whichs PP is different and flush it
    recursively.
    This greatly optimizes xdp_return_frame_bulk(): no more hashtable
    lookups. Also make xdp_flush_frame_bulk() inline, as it's just one if +
    function call + one u32 read, not worth extending the call ladder.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    23c4f5c View commit details
    Browse the repository at this point in the history
  19. xdp: get rid of xdp_frame::mem.id

    Initially, xdp_frame::mem.id was used to search for the corresponding
    &page_pool to return the page correctly.
    However, after that struct page now contains a direct pointer to its PP,
    further keeping of this field makes no sense. xdp_return_frame_bulk()
    still uses it to do a lookup, but this is rather a leftover.
    Remove xdp_frame::mem and replace it with ::mem_type, as only memory
    type still matters and we need to know it to be able to free the frame
    correctly.
    As a cute side effect, we can now make every scalar field in &xdp_frame
    of 4 byte width, speeding up accesses to them.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    92b506a View commit details
    Browse the repository at this point in the history
  20. xdp: add generic xdp_buff_add_frag()

    The code piece which would attach a frag to &xdp_buff is almost
    identical across the drivers supporting XDP multi-buffer on Rx.
    Make it a generic elegant onelner.
    Also, I see lots of drivers calculating frags_truesize as
    `xdp->frame_sz * nr_frags`. I can't say this is fully correct, since
    frags might be backed by chunks of different sizes, especially with
    stuff like the header split. Even page_pool_alloc() can give you two
    different truesizes on two subsequent requests to allocate the same
    buffer size. Add a field to &skb_shared_info (unionized as there's no
    free slot currently on x6_64) to track the "true" truesize. It can be
    used later when updating an skb.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    d5f4287 View commit details
    Browse the repository at this point in the history
  21. xdp: add generic xdp_build_skb_from_buff()

    The code which builds an skb from an &xdp_buff keeps multiplying itself
    around the drivers with almost no changes. Let's try to stop that by
    adding a generic function.
    There's __xdp_build_skb_from_frame() already, so just convert it to take
    &xdp_buff instead, while making the original one a wrapper. The original
    one always took an already allocated skb, allow both variants here -- if
    no skb passed, which is expected when calling from a driver, pick one via
    napi_build_skb().
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    3f75758 View commit details
    Browse the repository at this point in the history
  22. xsk: allow attaching XSk pool via xdp_rxq_info_reg_mem_model()

    When you register an XSk pool as XDP Rxq info memory model, you then
    need to manually attach it after the registration.
    Let the user combine both actions into one by just passing a pointer
    to the pool directly to xdp_rxq_info_reg_mem_model(), which will take
    care of calling xsk_pool_set_rxq_info(). This looks similar to how a
    &page_pool gets registered and reduce repeating driver code.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    1b659d2 View commit details
    Browse the repository at this point in the history
  23. xsk: make xsk_buff_add_frag really add a frag via __xdp_buff_add_frag()

    Currently, xsk_buff_add_frag() only adds a frag to the pool linked list,
    not doing anythig with the &xdp_buff. The drivers do that manually and
    the logic is the same.
    Make it really add an skb frag, just like xdp_buff_add_frag() does that,
    and freeing frags on error if needed. This allows to remove repeating
    code from i40e and ice and not add the same code again and again.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    bddf7b1 View commit details
    Browse the repository at this point in the history
  24. xsk: add generic XSk &xdp_buff -> skb conversion

    Same as with converting &xdp_buff to skb on Rx, the code which allocates
    a new skb and copies the XSk frame there is identical across the
    drivers, so make it generic. This includes copying all the frags if they
    are present in the original buff.
    System percpu Page Pools help here a lot: when available, allocate pages
    from there instead of the MM layer. This greatly improves XDP_PASS
    performance on XSk: instead of page_alloc() + page_free(), the net core
    recycles the same pages, so the only overhead left is memcpy()s.
    Note that the passed buff gets freed if the conversion is done w/o any
    error, assuming you don't need this buffer after you convert it to an
    skb.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    5f5c62c View commit details
    Browse the repository at this point in the history
  25. xsk: add helper to get &xdp_desc's DMA and meta pointer in one go

    Currently, when you send an XSk frame without metadata, you need to do
    the following:
    
    * call external xsk_buff_raw_get_dma();
    * call inline xsk_buff_get_metadata(), which calls external
      xsk_buff_raw_get_data() and then do some inline checks.
    
    This effectively means that the following piece:
    
    addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;
    
    is done twice per frame, plus you have 2 external calls per frame, plus
    this:
    
    	meta = pool->addrs + addr - pool->tx_metadata_len;
    	if (unlikely(!xsk_buff_valid_tx_metadata(meta)))
    
    is always inlined, even if there's no meta or it's invalid.
    
    Add xsk_buff_raw_get_ctx() (xp_raw_get_ctx() to be precise) to do that
    in one go. It returns a small structure with 2 fields: DMA address,
    filled unconditionally, and metadata pointer, valid only if it's
    present. The address correction is performed only once and you also
    have only 1 external call per XSk frame, which does all the calculations
    and checks outside of your hotpath. You only need to check
    `if (ctx.meta)` for the metadata presence.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    3908174 View commit details
    Browse the repository at this point in the history
  26. skbuff: allow 2-4-argument skb_frag_dma_map()

    skb_frag_dma_map(dev, frag, 0, skb_frag_size(frag), DMA_TO_DEVICE)
    is repeated across dozens of drivers and really wants a shorthand.
    Add a macro which will count args and handle all possible number
    from 2 to 5. Semantics:
    
    skb_frag_dma_map(dev, frag) ->
    __skb_frag_dma_map(dev, frag, 0, skb_frag_size(frag), DMA_TO_DEVICE)
    
    skb_frag_dma_map(dev, frag, offset) ->
    __skb_frag_dma_map(dev, frag, offset, skb_frag_size(frag) - offset,
    		   DMA_TO_DEVICE)
    
    skb_frag_dma_map(dev, frag, offset, size) ->
    __skb_frag_dma_map(dev, frag, offset, size, DMA_TO_DEVICE)
    
    skb_frag_dma_map(dev, frag, offset, size, dir) ->
    __skb_frag_dma_map(dev, frag, offset, size, dir)
    
    No object code size changes for the existing callers. Users passing
    less arguments also won't have bigger size comparing to the full
    equivalent call.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    2b9703f View commit details
    Browse the repository at this point in the history
  27. jump_label: export static_key_slow_{inc,dec}_cpuslocked()

    Sometimes, there's a need to modify a lot of static keys or modify the
    same key multiple times in a loop. In that case, it seems more optimal
    to lock cpu_read_lock once and then call _cpuslocked() variants.
    The enable/disable functions are already exported, the refcounted
    counterparts however are not. Fix that to allow modules to save some
    cycles.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    ae28117 View commit details
    Browse the repository at this point in the history
  28. libeth: support native XDP and register memory model

    Expand libeth's Page Pool functionality by adding native XDP support.
    This means picking the appropriate headroom and DMA direction.
    Also, register all the created &page_pools as XDP memory models.
    A driver then can call xdp_rxq_info_attach_page_pool() when registering
    its RxQ info.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    35d3947 View commit details
    Browse the repository at this point in the history
  29. libeth: add a couple of XDP helpers (libeth_xdp)

    "Couple" is a bit humbly... Add the following functionality to libeth:
    
    * XDP shared queues managing
    * XDP_TX bulk sending infra
    * .ndo_xdp_xmit() infra
    * adding buffers to &xdp_buff
    * running XDP prog and managing its verdict
    * completing XDP Tx buffers
    * ^ repeat everything for XSk
    
    Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> # lots of stuff
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    6f307ba View commit details
    Browse the repository at this point in the history
  30. idpf: make complq cleaning dependent on scheduling mode

    Extend completion queue cleaning function to support queue-based
    scheduling mode needed for XDP queues.
    Add 4-byte descriptor for queue-based scheduling mode and
    perform some refactoring to extract the common code for
    both scheduling modes.
    
    Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    michalQb authored and alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    a6d1416 View commit details
    Browse the repository at this point in the history
  31. idpf: remove SW marker handling from NAPI

    SW marker descriptors on completion queues are used only when a queue
    is about to be destroyed. It's far from hotpath and handling it in the
    hotpath NAPI poll makes no sense.
    Instead, run a simple poller after a virtchnl message for destroying
    the queue is sent and wait for the replies. If replies for all of the
    queues are received, this means the synchronization is done correctly
    and we can go forth with stopping the link.
    
    Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    michalQb authored and alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    1ecd090 View commit details
    Browse the repository at this point in the history
  32. idpf: prepare structures to support xdp

    Extend basic structures of the driver (e.g. 'idpf_vport', 'idpf_*_queue',
    'idpf_vport_user_config_data') by adding members necessary to support XDP.
    Add extra XDP Tx queues needed to support XDP_TX and XDP_REDIRECT actions
    without interfering a regular Tx traffic.
    Also add functions dedicated to support XDP initialization for Rx and
    Tx queues and call those functions from the existing algorithms of
    queues configuration.
    
    Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
    Co-developed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    michalQb authored and alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    707e479 View commit details
    Browse the repository at this point in the history
  33. idpf: implement XDP_SETUP_PROG in ndo_bpf for splitq

    Implement loading the XDP program using ndo_bpf
    callback for splitq and XDP_SETUP_PROG parameter.
    
    Add functions for stopping, reconfiguring and restarting
    all queues when needed.
    Also, implement the XDP hot swap mechanism when the existing
    XDP program is replaced by another one (without a necessity
    of reconfiguring anything).
    
    Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    michalQb authored and alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    8a6f7e0 View commit details
    Browse the repository at this point in the history
  34. idpf: use generic functions to build xdp_buff and skb

    In preparation of XDP support, move from having skb as the main frame
    container during the Rx polling to &xdp_buff.
    This allows to use generic and libie helpers for building an XDP buffer
    and changes the logics: now we try to allocate an skb only when we
    processed all the descriptors related to the frame.
    Store &libeth_xdp_stash instead of the skb pointer on the Rx queue.
    It's only 8 bytes wider and there's a place to fit it in.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    7aa5b1a View commit details
    Browse the repository at this point in the history
  35. idpf: add support for XDP on Rx

    Use libeth XDP infra to support running XDP program on Rx polling.
    This includes all of the possible verdicts/actions.
    XDP Tx queues are cleaned only in "lazy" mode when there are less than
    1/4 free descriptors left on the ring. libeth helper macros to define
    driver-specific XDP functions make sure the compiler could uninline
    them when needed.
    Use __LIBETH_WORD_ACCESS to parse descriptors more efficiently when
    applicable. It really gives some good boosts and code size reduction
    on x86_64.
    
    Co-developed-by: Michal Kubiak <michal.kubiak@intel.com>
    Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    242d36e View commit details
    Browse the repository at this point in the history
  36. idpf: add support for .ndo_xdp_xmit()

    Use libeth XDP infra to implement .ndo_xdp_xmit() in idpf.
    The Tx callbacks are reused from XDP_TX code. XDP redirect target
    feature is set/cleared depending on the XDP prog presence, as for now
    we still don't allocate XDP Tx queues when there's no program.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    7e0b8ef View commit details
    Browse the repository at this point in the history
  37. idpf: add XDP RSS hash hint

    Add &xdp_metadata_ops with a callback to get RSS hash hint from the
    descriptor. Declare the splitq 32-byte descriptor as 4 u64s to parse
    them more efficiently when possible.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    74ed675 View commit details
    Browse the repository at this point in the history
  38. idpf: add vc functions to manage selected queues

    Implement VC functions dedicated to enabling, disabling and configuring
    randomly selected queues.
    
    Also, refactor the existing implementation to make the code more
    modular. Introduce new generic functions for sending VC messages
    consisting of chunks, in order to isolate the sending algorithm
    and its implementation for specific VC messages.
    
    Finally, rewrite the function for mapping queues to q_vectors using the
    new modular approach to avoid copying the code that implements the VC
    message sending algorithm.
    
    Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
    Co-developed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    michalQb authored and alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    dd01c33 View commit details
    Browse the repository at this point in the history
  39. idpf: add XSk pool initialization

    Add functionality to setup an XSk buffer pool, including ability to
    stop, reconfig and start only selected queues, not the whole device.
    Pool DMA mapping is managed by libeth.
    
    Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    michalQb authored and alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    62a153f View commit details
    Browse the repository at this point in the history
  40. idpf: implement Tx path for AF_XDP

    Implement Tx handling for AF_XDP feature in zero-copy mode using
    the libeth (libeth_xdp) XSk infra.
    When the NAPI poll is called, XSk Tx queues are polled first,
    before regular Tx and Rx. They're generally faster to serve and
    have higher priority comparing to regular traffic.
    
    Co-developed-by: Michal Kubiak <michal.kubiak@intel.com>
    Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    af817d9 View commit details
    Browse the repository at this point in the history
  41. idpf: implement Rx path for AF_XDP

    Implement Rx packet processing specific to AF_XDP ZC using the libeth
    XSk infra. Initialize queue registers before allocating buffers to
    avoid redundant ifs when updating the queue tail.
    
    Co-developed-by: Michal Kubiak <michal.kubiak@intel.com>
    Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    3ce0d15 View commit details
    Browse the repository at this point in the history
  42. idpf: enable XSk features and ndo_xsk_wakeup

    Now that AF_XDP functionality is fully implemented, advertise XSk XDP
    feature and add .ndo_xsk_wakeup() callback to be able to use it with
    this driver.
    
    Co-developed-by: Michal Kubiak <michal.kubiak@intel.com>
    Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    alobakin committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    de4b645 View commit details
    Browse the repository at this point in the history

Commits on Jul 18, 2024

  1. idpf-linux: block changing ring params while af_xdp is active

    Changing ring parameters, especially ring size, should not
    be modified while AF_XDP socket is assigned to any Rx ring.
    
    Implement a function for checking all Rx queues for AF_XDP
    socket assign and block changing queue parameters if at least
    one Rx queue has AF_XDP socket.
    
    Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
    michalQb committed Jul 18, 2024
    Configuration menu
    Copy the full SHA
    04fdca7 View commit details
    Browse the repository at this point in the history