-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Straight NIC #801
Comments
If this is feasible, it would be excellent in many ways. The only downside is that it'll absolutely fully load a whole CPU core, and the budget per packet will be measured in the low tens of nanoseconds. I wonder how often we'll run into thermal throttling of other cores on the same physical CPU... |
AFAIU with this design, you could connect directly to the NIC if you wanted to handle all of its traffic, so no difference from the current status. Just brainstorming the sorts of things we'd need to write if we do this :)
Would we use virtio-style queues between processes or something bespoke? I don't know the tradeoffs. I would lean towards bespoke, of course :) Some things that this would enable us to have:
Needless to say I like it and would love to have this as part of the Snabb story, contingent on it actually working :-)) |
Are you thinking of copying packet data from the dispatcher to sub-process ring buffers, or sharing packets in some global pool and passing around descriptors instead? struct ipc_link {
struct packet packets[LINK_RING_SIZE];
uint64_t read, write;
}; ^ that would be a delightful link, if it performed ok :) |
Though, I guess since you can't write a packet atomically, you run the risk of corruption. I suppose you could detect that via how far the write pointer advanced while you were reading... a big challenge in any case :) |
Isn't it already? Or does the NIC stamp src MAC when VMDq is enabled? It cannot reasonably put dst MAC in there (how would it know?). I say it stays in app network. I'm torn over the subject in general. Sure, simplification sounds like a good thing but on the other hand I want 100G and I don't think it's reasonable to expect a single core to be able to take over the RSS job from the NIC, so I'd like RSS. At the same time I don't know enough about the drivers, are they really that complex? |
I have also been feeling conflicted in exactly the way @plajjan describes. On the one hand the pure software implementation is compellingly simple but on the other hand it feels too optimistic to assume this will always be sufficient. On reflection now I feel that neither extreme is right - all hardware or all software - and I have a proposal for a compromise. Let me review the extremes and then spell out the proposal for a compromise. Hardware-oriented approachThe fundamental trouble with the all-hardware approach is dealing with the variability between NICs. How do you write drivers with a consistent interface when the hardware they are controlling has different behavior? For example, here are some of the questions that have different answers from one NIC to the next:
One way to deal with this is to say that drivers will each have a unique interface that describes the card they support. This punts the problem up to the application developer e.g. to pick one card to target and deal with its pecularities. This is basically the situation today e.g. the NFV application has targeted the Intel 82599 specifically and when we added Solarflare support we worked with them to extend their firmware for feature parity. However, in the big picture this is not so satisfactory. One problem is that it does not help applications to support many NICs and still benefting from the hardware that is available. Another problem is that hardware sometimes lets you down later in the lifecycle -- for example you get a new requirement to support a transport protocol that the NIC does not understand (e.g. GTP, L2TP, MPLS, etc) and that could screw up your application by hashing all your input into the same bucket -- forcing you to redesign the application or to new hardware. Software-oriented approachThe all-software approach has the virtue of uniformity and predictability. This is huge, as described above. The primary problem is uncertainty about what this costs: how much CPU must you reserve for dispatching? does it create a bottleneck that limits your peak packet rate? Then once you have these answers the secondary problem is: Is it worth it? for which applications? I would love to reach the point where we can say that software dispatching has modest overhead and is the right choice for all but the most extreme applications. However, in practice it is a lot of work to understand the possibilities, and there is no guarantee that the results will represent the right trade-off for application developers e.g. that the NFV application would really be better off switching from VMDq to a software switch+tag app. Proposed compromiseI think we need to make a pragmatic compromise that takes advantage of hardware features when available, software-emulates hardware features that are not available, and provides well-defined interfaces to application developers. I have a proposal for how we could do this and it is one of those "turtles all the way down" ideas. The idea is that we could define abstract apps that become concrete when you instantiate them. The That is the whole idea. Let me give an example to be more concrete. Suppose we developed a This app can have multiple ports attached to it, can dispatch packets based on DMAC and/or VLAN, can be configured to automatically insert/remove 802.1Q VLAN tags, and can optionally provide all of the interface counters required by SNMP MIB-II objects. Suppose that you instantiate the Now suppose that you instantiate the If the config required SNMP counters to be provided then we may also need to include a software app that inspects the packets and updates the appropriate tallies (unicast packets, broadcast packets, etc). This would mean that applications depending on VMDq-like functionality could be deployed on every I/O medium that the The overall effect would seem to be to make life easy for driver developers (just implement what the card supports), and to make life easy for application developers (just pick the abstraction that suits your application), but difficult for people writing the abstract apps (the This would also allow the hardware vs software battle to continue in the background. Vendors can add powerful new features to NICs, hackers can write optimized software alternatives, and over time we will see if these keep each other in balance or whether the wind blows one way or the other. |
@lukego I love it! I was actually thinking about something similar although I think you expressed my abstract spaghetti thoughts into something much more concrete. For my SnabbDDoS program I want to support:
Single NIC (82599 or tap) requires VLAN tagging (unless we want really weirdo config on the router with PBR or similar). Now, to complicate things I have an issue on the 82599EB-old-a-f card that vmdq doesn't seem to work properly so for most 82599 cards I want vmdq but for this particular version I want to use software for VLAN demux/muxing. Thus the final config matrix is rather complicated. The current state of the code in SnabbDDoS program for initing the app network is a mess and it makes me sad panda just looking at it. I was thinking if I could abstract the 82599-vmdq-or-software-vlan into an app to simplify things and I think this is just what you have described here but in a much more generic and elegant way. Complexity of switched_nic could, as you point out, shoot through the roof if we are not careful. It's probably a good idea to start out with a minimal feature set and ignore more advanced features to begin with. I'm thinking just matching the vmdq feature set, like vlan (de)tag and src mac rewrite, so it's noop on 82599 but it will do things for a tap interface. |
What you describe sounds like relatively heavy weight composite apps and is using the
I'm all in favor of the StraightNIC idea with the extension of RSS support. It's my feeling that most of the acute performance concerns that lead to fancy hardware and complex drivers could be alleviated with having more cores. Some apps designs might not parallelize to many cores nicely, but it's my slightly biased opinion that they were going to hit problems anyway. If the role is 'popular' enough to need more than 1 core it probably also needs more than 1 10gig link, more than 1 server and more than one site. So you may as well figure out how to scale horizontally from the start. Not all NICs support all RSS hash modes, but then there aren't many guarantees on what upstream ECMP hashing will look like either. Huge flows that would overwhelm a given path are a pain but they always have been and always will be. 100gig was great until we needed n*100gig :( |
@plajjan |
I was thinking about switched_nic as a form of config helper, not that it actually does all the work. For example, I instantiate a switched_nic and say I want VLAN X on NIC Y. If Y=02:00.0 is an 82599 it will be configured with VMDq for VLAN tagging. If Y happens to be a tap interface then switched_nic will instantiate a Tap driver and a VlanMux and connect those together. Same end result, something that takes input packets and strips VLAN tags!
No, not really. It doesn't sound like a lot of work but that's probably because we have different ideas of what it is or should do. To me it's a convenience API so I don't have to think about what the NIC supports or not. Just adding the apps you list won't do anything. It's kind of the situation we have today. I already have an app that adds/removes VLANs (#863) but how do I add that to my app network and when do I need to? Should I never use VMDq so my app network is consistent? (StraightNIC!) or should I allow hardware offload for some stuff. That complexity is what I would like to abstract away.
Well, this is a problem for my program not my app. The DDoS app doesn't care about the NIC, it just wants packets. But the SnabbDDoS program does need to care, how else would this whole thing work? Also, funny that you would mention MPLS since I'm working with a network where half the ho-ha is about us not using MPLS ;)
Okay. So it sounds like you have bigger things in mind here. I was thinking of switched_nic as something I'd write in a day. Rewriting Snabb's config to be something other than Lua sounds like a slightly bigger topic. The way things are right now, Snabb offers nothing in this area. It is up to each developer to build a program that takes the configuration options they wish to support and in effect that will limit the deployment options you have as a user. I personally have no need for MPLS so I will not add that to SnabbDDoS. I definitely think there are improvements to be made here. I am not really interested in writing all this "glue" stuff for deployment. Snabb doesn't even have MPLS support today but let's say I upstream SnabbDDoS and someone adds MPLS tomorrow, wouldn't it be neat if, like you express, a user could just add some config options and use SnabbDDoS in an MPLS deployment? I think so! Snabb is really just a basic framework at this point. If we look at the only other thing out there that is vaguely similar - VPP - there is still a stark contrast in what they provide. Both Snabb and VPP have the ideas of nodes in a graph that each do one little thing well. Both use batching and various other clever techniques to achieve high performance. But VPP offers a lot more out of the box. You can spin one up, configure a couple of interfaces and some static routes, through a CLI, and have it forward packets for you. It's like Snabb but there is a default app network which provides you with lots of stuff. I wrote a DDoS app, a node in the graph/app network.
I guess Snabb will move in the direction of VPP. Once there are more standard apps available it makes sense to ship most of these in a default app network. Developers are always allowed to start over with a clean slate if they wish but they shouldn't need to. @petebristow I don't want my NMS to have to worry about whether NIC + VLAN is actually two apps or if one app will do the job. Internal app network is largely irrelevant to external parties/users. Who wants to think about having to put a "reassembler" app in there? It makes sense from a dev perspective to have it as a separate app but from a user perspective it sucks having to think about it. |
@lukego |
I am on board with this vision! The code on #897 looks very good to me. I would be happy to bring in an 82599 driver in this style. Have you considered writing an X710/XL710 driver rather than 82599 next? This would punt the compatibility issues down the road a little and also give us a working driver for cool new hardware that is abundant in the lab :-) e.g. VMDq could be supported in the same way as RSS here, no? |
The problem with the X710 is that I have none of them in my network but have a huge number of 82599 that I would like to power with snabb but need RSS in place first. |
Cool just checking :). I am hopeful that RSS and VMDq can work in combination in a simple way but we shall see. |
Just to chime in here: from the SnabbNFV perspective there is currently one NIC App per MAC-address (VMDq), RSS is not applicable if I understand correctly. If the 82599 were single-app/many-links in VMDq mode instead of “sub-apps” that would be much better for my current concerns in #886. I don't like the multi-app approach of the 82599 because it has all the disadvantages but doesn't actually yield any benefits. Assuming that the same thing that #897 does for RSS can be done for VMDq on the other hand makes this approach interesting. So I see one I/O interface that is 1-N (which I would love to have right now) and one that is 1-1 (which I think might be smarter): From the SnabbNFV perspective the disadvantage for 1-1 (the first on the diagram) would be possible locking overhead even though everything runs in a single process/thread. On the same hand there is a big advantage in that the SnabbNFV application could be horizontally scaled across cores? We we would end up with interfaces still, but simpler ones:
|
Looking at this 2 yrs since this post started. Netdev has made alot of in roads on this, with tc_offloading also a common driver feature, I feel this could be implemented using tc and bcc and be a complementing to snabb |
Here is a radical idea that came up in a conversation with @wingo:
To adopt a "StraightNIC" design where device drivers are absolutely minimal: one transmit queue, one receive queue, end of story. The rest is done in software and works exactly the same for all I/O sources (1G/10G/40G/100G, Intel/Mellanox/Virtio/Tuntap, ...).
This would be following the example of the "straightline" redesign where we made our
packet
struct absolutely minimal by removing scatter-gather buffer chains, checksum metadata, separate memory pools, etc. This invalidated a lot of special-case optimizations but the overall result has been simpler code and better performance.The gambit is that focusing on the simplest and most general case leads to the best overall outcome i.e. expect that special-case optimizations like VMDq, RSS, FlowDirector, etc, are a net loss in the big picture because they lead to user-visible inconsistency and soak up development time e.g. when trying to add new hardware support with sufficient functionality.
Thoughts???
See also I/O 2.0 (#687). This would also depend very much on highly optimized software replacements for the relevant NIC functions (#691).
The text was updated successfully, but these errors were encountered: