diff --git a/README.md b/README.md index ecda05efc..28424fef4 100644 --- a/README.md +++ b/README.md @@ -50,5 +50,5 @@ Any use of third-party trademarks or logos are subject to those third-party's po
- +
diff --git a/documentation/general/design/dash-sonic-hld.md b/documentation/general/design/dash-sonic-hld.md index 6f59aa371..88979800a 100644 --- a/documentation/general/design/dash-sonic-hld.md +++ b/documentation/general/design/dash-sonic-hld.md @@ -219,7 +219,7 @@ DASH_ACL_OUT:{{eni}}:{{stage}} ``` ``` -key = DASH_ACL_IN:eni:stage ; ENI MAC and state as key; ACL stage can be {1, 2, 3 ..} +key = DASH_ACL_IN:eni:stage ; ENI MAC and stage as key; ACL stage can be {1, 2, 3 ..} ; field = value acl_group_id = ACL group ID ``` @@ -516,7 +516,7 @@ DASH_VNET:Vnet1: { DASH_ENI:F4939FEFC47E : { "eni_id": "497f23d7-f0ac-4c99-a98f-59b470e8c7bd", "mac_address": "F4939FEFC47E", - "pa_addr": 25.1.1.1, + "underlay_ip": 25.1.1.1, "admin_state": "enabled", "vnet": "Vnet1" } @@ -604,7 +604,7 @@ For the example configuration above, the following is a brief explanation of loo d. Mapping table for 10.1.1.1 shall be hit and it takes the action "vnet_encap". e. Encap action shall be performed and use PA address as specified by "underlay_ip" 2. Packet destined to 10.1.0.1: - a. LPM lookup hits for entry 10.1.0.24/24 + a. LPM lookup hits for entry 10.1.0.0/24 b. The action in this case is "vnet" and the routing type for "vnet" is "maprouting", with overlay_ip specified c. Next lookup shall happen on the "mapping" table for Vnet "Vnet1", but for overlay_ip 10.0.0.6 d. Mapping table for 10.0.0.6 shall be hit and it takes the action "vnet_encap". diff --git a/documentation/high-avail/design/README.md b/documentation/high-avail/design/README.md index 3a234eba3..ba3c38ed9 100644 --- a/documentation/high-avail/design/README.md +++ b/documentation/high-avail/design/README.md @@ -12,3 +12,4 @@ This folder contains DASH High Avalability and Scale design and architecture doc | ------------------------------------------------------ | ------------------------------------------ | | [high-availability-and-scale.md](high-availability-and-scale.md) | DASH High-Availability and Scale design document | | [xsight-labs-ha-proposal-v1.md](xsight-labs-ha-proposal-v1.md) | Initial HA proposal document | +| [xsight-labs-ha-proposal-new-ideas.md](xsight-labs-ha-proposal-new-ideas.md)|Addendum to the initial HA proposal (preview)| diff --git a/documentation/high-avail/design/images/ha-state-sync-packet-format.svg b/documentation/high-avail/design/images/ha-state-sync-packet-format.svg new file mode 100644 index 000000000..1f0fe01de --- /dev/null +++ b/documentation/high-avail/design/images/ha-state-sync-packet-format.svg @@ -0,0 +1,4 @@ + + + +
Ethernet
Ethernet
IP
IP
UDP
UDP
Flags
Flags
Opaque
Opaque
Msg 1
Msg 1
Opaque
Opaque
Msg 2
Msg 2
Opaque
Opaque
Msg N
Msg N
Opaque
Opaque
...
...
State Synchronization Packet Format
State Synchronization Packet Format
Truncated
(for replies, when requested)
Truncated...
Receiver specifies “N”, maximum messages allowed per packet (can be 1)
Sender may choose to send 1 to N messages per packet
Receiver specifies “N”, maximum messages allowed per packet...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/documentation/high-avail/design/images/ha-sync-operations.svg b/documentation/high-avail/design/images/ha-sync-operations.svg new file mode 100644 index 000000000..b9a7af397 --- /dev/null +++ b/documentation/high-avail/design/images/ha-sync-operations.svg @@ -0,0 +1,4 @@ + + + +
DPU 2
DPU 2
DPU 1
DPU 1
Lossy Channel
Lossy Channel
State Updates
State Updates
Replies
Replies
Sender
Sender
Receiver
Receiver
State Updates
State Updates
Replies
Replies
Receiver
Receiver
Sender
Sender
Fixed Behavior
Simple & Stateless
Fixed Behavior...
Flexible Behavior
Stateless or Stateful
Flexible Behavior...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/documentation/high-avail/design/xsight-labs-ha-proposal-new-ideas.md b/documentation/high-avail/design/xsight-labs-ha-proposal-new-ideas.md new file mode 100644 index 000000000..9a8c7ccb7 --- /dev/null +++ b/documentation/high-avail/design/xsight-labs-ha-proposal-new-ideas.md @@ -0,0 +1,126 @@ +# DASH High Availability (HA) proposal preview + +By John Carney, Xsight Labs + +There has been disagreement among members of the DASH community about the HA +requirements, tradeoffs, and proposed protocols/approaches. Each vendor has +unique architectures and constraints. There are many tradeoffs, including the +fidelity of HA/fault tolerance as well as the bandwidth/processing costs. + +##  Desirable HA properties and goals + +The following are desirable HA properties/goals (there are probably more). Due +to differences in architectures, constraints, and deployment use cases, I +propose that these remain qualitative not be quantified as strict requirements +of DASH. A DASH buyer may quantify any of these properties as strict +requirements for their particular deployment. + +- Minimize or eliminate the possibility of established connections breaking +after a failover. If the endpoints of a connection have gotten into the +established state prior to a failover, then the connection should not be black +holed after a failover. + +- Minimize the time to remove closed connections to avoid filling the connection +table with zombie connections. If a connection is closed/removed on one DPU, +then the connection should be quickly removed on the peer DPU. Zombie +connections will eventually age out. There should be some tolerance for a small +or bounded number of zombie connections in the connection table, especially +after a failover. + +- Minimize the necessity for the endpoints to retransmit packets in order to +"replay" packets that cause state changes and are dropped due to HA transport or +processing constraints. + +- Minimize link bandwidth and DPU processing overhead for HA state +synchronization. + +Some of the above may represent conflicting goals for a particular HA approach. +For example, one HA approach may be able to minimize/eliminate the possibility +of breaking established connections by consuming more bandwidth for HA. Such +tradeoffs are appropriate in different use cases. + +We are now working on a proposal for an HA protocol definition that will provide +HA interoperability while also enabling flexibility for each vendor to achieve +the above properties, given their own architecture, constraints and chosen +tradeoffs. Each vendor can individually quantify and be tested on the merits of +the HA properties described above. DASH should neither resort to a "least common +denominator" approach nor force complexity and HA modes that are too costly or +unimplementable for some vendors. The buyer of a DASH solution can test the +vendor's compliance with the defined HA protocol and decide if the vendor's +tradeoffs and adherence to the desired HA properties meets the requirements for +their use case. + +We can publish the proposal with much more detail at a later date; in the +meantime, a **preview is shown below**. + +## Proposal preview + +For state synchronization there are state **sender** and a state **receiver** +roles. Each DPU implements both roles. There are two types of HA messages: +"state update" and "packet update". These will be defined in more detail in the +proposal. + +The receiver must be able to parse and process both types of messages. The +sender may choose to coalesce multiple synchronization messages into a single +state synchronization packet, however the receiver will advertise the maximum +number of coalesced messages supported. The sender must honor this. A receiver +can specify this to be 1. The receiver's processing of HA packets/messages is +defined to be **simple and stateless**. + +The sender of HA state synchronization updates has the full flexibility to be +stateless or stateful. The sender will specify with each state synchronization +packet whether a reply (completion) is requested and a hint of whether the reply +may be truncated. The reply is simply the original HA state synchronization +packet with a reply flag set and is possibly truncated (it is allowed, but not +bandwidth optimized, for the receiver to not truncate the reply when requested). +The sender may optionally include opaque information with each individual +message in the synchronization packet and/or for the synchronization packet at a +whole. When the reply is returned to the sender, the opaque information can be +used in an implementation specific manner to accomplish stateless or stateful +synchronization operations. + +![ha-sync-operations](images/ha-sync-operations.svg) + +Here are some examples of different HA approaches that are possible with this +simple protocol. A vendor may select among these (or other possible) approaches. +A vendor may limit their HA implementation to only the approach(es) that are +possible, feasible, or best for their architecture. The definition of the +receiver behavior is simple and remains independent of, but interoperable with, +any sender approach. + +1. The sender may send packets that causes state updates to the receiver and + have it returned back for transmission to the endpoint. Drops due to + transport unreliability or exceeding DPU processing limits are retransmitted + by the endpoints. +2. The sender may send a state update message with each state change event to + the receiver without requesting replies. The sender may periodically resend + the entire connection state without requesting replies. +3. With each packet that causes a state change, the sender may buffer/hold the + packet and then send a state change message with an opaque value to the + receiver, requesting a reply. When the opaque value is returned with the + reply, it is associated with the held packet that is then transmitted to the + endpoint. If the reply is not returned in a timely manner, the sender may + drop the held packet and free the buffer, relying on the endpoints to + retransmit dropped packets. Alternatively, the sender may choose to resend + the synchronization message to the receiver, effectively creating a reliable + transport without imposing any impact on the endpoints. + +In addition to the above, there are many other possible tradeoffs that a sender +may make. The sender may choose to not send certain state change events for a +connection. This may save bandwidth at the expense of some fault tolerance. The +sender may only choose to buffer/hold certain packets and not others. For +example, there may be less value in buffering/holding syn packets. If the peer +does not learn of the syn and there is a failover. The syn-ack will be dropped +by the peer and the syn will be retransmitted by the endpoint. Anyone may +contribute a full definition and analysis any of the sender approaches above, or +other possible approaches/optimizations, as an optional "standardized" DASH HA +sender modes. These modes only affect the sender behavior. All sender modes use +the same protocol definition and the receiver behavior will always remain +simple, stateless and interoperable with all sender modes. + +![ha-state-sync-packet-format](images/ha-state-sync-packet-format.svg) + +We can produce the proposal with more detail, including the full packet/message +formats. We are happy to get early feedback and discussion before formalizing +the proposal. I wanted to get this out to you so that you can think about it, +and possibly respond, before the next meeting. diff --git a/documentation/high-avail/slides/DASH High Availability.pptx b/documentation/high-avail/slides/DASH High Availability.pptx index 55b7b9d23..d6f81d17a 100644 Binary files a/documentation/high-avail/slides/DASH High Availability.pptx and b/documentation/high-avail/slides/DASH High Availability.pptx differ diff --git a/documentation/images/icons/dash-icon-large.svg b/documentation/images/icons/dash-icon-large.svg new file mode 100644 index 000000000..fcfddde27 --- /dev/null +++ b/documentation/images/icons/dash-icon-large.svg @@ -0,0 +1,4 @@ + + + + \ No newline at end of file diff --git a/documentation/images/icons/dash-icon-medium.svg b/documentation/images/icons/dash-icon-medium.svg new file mode 100644 index 000000000..793c62b37 --- /dev/null +++ b/documentation/images/icons/dash-icon-medium.svg @@ -0,0 +1,4 @@ + + + + \ No newline at end of file diff --git a/documentation/images/icons/dash-icon-small.svg b/documentation/images/icons/dash-icon-small.svg new file mode 100644 index 000000000..59f8357f9 --- /dev/null +++ b/documentation/images/icons/dash-icon-small.svg @@ -0,0 +1,4 @@ + + + + \ No newline at end of file diff --git a/documentation/images/icons/dash-icon-xlarge.svg b/documentation/images/icons/dash-icon-xlarge.svg new file mode 100644 index 000000000..fc66d239f --- /dev/null +++ b/documentation/images/icons/dash-icon-xlarge.svg @@ -0,0 +1,4 @@ + + + + \ No newline at end of file