Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ethernet core #8

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open

Add Ethernet core #8

wants to merge 28 commits into from

Conversation

t-wallet
Copy link
Collaborator

@t-wallet t-wallet commented Aug 28, 2024

This PR contains an Ethernet II + IPv4 + UDP core.

Still TODO:

t-wallet and others added 2 commits August 28, 2024 16:00
Co-authored-by: Jasmijn Bookelmann <bookelmannjasmijn@gmail.com>
Co-authored-by: Cato van Ojen <Baublesaurus@users.noreply.github.com>
Co-authored-by: MatthijsMu <93450301+MatthijsMu@users.noreply.github.com>
Co-authored-by: Jasper Laumen <96011116+JLaumen@users.noreply.github.com>
Co-authored-by: Mart Koster <30956441+Akribes@users.noreply.github.com>
Co-authored-by: Bryan Rinders <bryan__ajax@hotmail.com>
Co-authored-by: Daan Weessies <daan.weessies@gmail.com>
Co-authored-by: Rowan Goemans <goemansrowan@gmail.com>
, (Maybe ArpResponse, (Maybe IPv4Address, Df.Data ArpLite)))
-- User issues a lookup request. We don't have a timeout, because the ARP table should
-- always respond within a reasonable time frame. If not, there is a bug in the ARP table.
arpManagerT AwaitLookup{..} (Just lookupIPv4, arpResponseIn, Ack readyIn, _) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why this _awaitTransmission is necessary? If you are in the AwaitLookup state and you receive a Just lookupIPv4 why can't you just forward that directly to the ARP table?

Copy link
Collaborator Author

@t-wallet t-wallet Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do forward the IP address unconditionally? _awaitTransmission is needed to be able to keep driving the same ARP request to the transmitter if the transmitter asserts backpressure.
It makes no difference for a single entry ARP table which has one clock cycle latency, which is why the old version worked in hardware. But with a multi-entry table you can have a longer latency. And that's where the notorious bug that stopped the multi-entry stack from working came from.

I.e. it sent an ARP request to the transmitter, the transmitter did nothing with it because he wasn't ready, but we already moved to the waiting state.

src/Clash/Cores/Ethernet/Arp/ArpTable.hs Outdated Show resolved Hide resolved
src/Clash/Cores/Ethernet/Arp/ArpTable.hs Outdated Show resolved Hide resolved
src/Clash/Cores/Ethernet/Examples/ArpStack.hs Outdated Show resolved Hide resolved
-- ^ My MAC Address
-> Signal dom (IPv4Address, IPv4Address)
-- ^ My IP address and the subnet
-> Circuit (PacketStream dom dataWidth (IPv4Address, UdpHeaderLite)) (PacketStream dom dataWidth (IPv4Address, UdpHeaderLite))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure anymore of providing the UDP circuit as a parameter. Because it's very limiting in what you can then do with it.

In GiPHouse/qbaylogic-clash-based-macipudp-stack-spring24#155 I have rewritten it so you have more flexibility. I would like your and others opinion on this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like

Circuit
       (PacketStream dom dataWidth (IPv4Address, UdpHeaderLite), PacketStream domEthRx 1 ())
       (PacketStream dom dataWidth (IPv4Address, UdpHeaderLite), PacketStream domEthTx 1 ())

as a type for the UDP core. It clearly conveys that outgoing packets are either derived from incoming packets (i.e. ARP, UDP echo) or are coming from the outside (i.e. DHCP discovery). And that not all incoming packets are replied to.

The current version seems to assume that all incoming packets need to be replied to. Which is obviously not the case for anything but echos. So, let's change it!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@t-wallet t-wallet Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. Only thing that I added is that you really need a fifo after the IP depacketizer.

A UDP echo stack can now be expressed as:

udpEcho = circuit $ \phyIn -> do
  phyIn' <- unsafeRgmiiRxC ... -< phyIn
  udpOut' <- mapMeta (\(ip, hdr) -> (ip, hdr{_udplDstPort = _udplSrcPort hdr, _udplSrcPort = _udplDstPort hdr})) -< udpOut
  (udpOut, toTxPhy) <- fullStackC ... < (udpOut', phyIn')
  rgmiiTxC ... -< toTxPhy

src/Clash/Cores/Ethernet/Examples/FullUdpStack.hs Outdated Show resolved Hide resolved
src/Clash/Cores/Ethernet/IP/IPPacketizers.hs Show resolved Hide resolved
src/Clash/Cores/Ethernet/Icmp.hs Outdated Show resolved Hide resolved
(KnownNat dataWidth) =>
(1 <= dataWidth) =>
Circuit (PacketStream dom dataWidth ()) (PacketStream dom dataWidth ())
preambleStripperC =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure anymore that an incoming ethernet packet will always have the full preamble since it's partially used for clock recovery. @DigitalBrains1 Do you know this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is true, then we may want to investigate stripping the preamble in the Ethernet domain. It should be simple enough to not hurt timings that much, and I suspect even putting registers between the component will be cheaper than doing it for generic dataWidth.

Copy link
Member

@DigitalBrains1 DigitalBrains1 Sep 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly sure a receiver can discard the preamble; not sure whether in whole or if there is a minimum number of bit times it needs to pass on for valid reception. I've quickly browsed through 802.3 and found the following things:

802.3-2018 section 4.2.5 Preamble generation

In a LAN implementation, most of the Physical Layer components are allowed to provide valid output some number of bit times after being presented valid input signals. Thus it is necessary for a preamble to be sent before the start of data, to allow the PLS circuitry to reach its steady state.

I think this says a receiver is allowed to only start outputting the preamble some number of bit times after the preamble starts, and the preamble is there to absorb the difference.

4.1.2.1.2 Reception without contention

The Physical Layer passes subsequent bits up to the MAC sublayer, where the leading bits are discarded, up to and including the end of the preamble and Start Frame Delimiter.

Not really conclusive, but I feel it leaves the door wide open to say the number of leading bits is inconsequential.

In section 4.2.9 Frame reception, IEEE 802.3 provides pseudo-Pascal programs to show what a working implementation of a CSMA/CD Media Access sublayer could look like. In it's BitReceiver process, which produces received frames, it strips off stuff by invoking another procedure; the call is documented:

PhysicalSignalDecap; {Skip idle and extension, strip off preamble and sfd}

The procedure itself is only described:

procedure PhysicalSignalDecap;
begin{Receive one bit at a time from physical medium until a valid sfd is detected, discard bits and return}end; {PhysicalSignalDecap}

Nowhere does it say anything about a check on the length of the preamble in the receiver.

Without referring to any standard, it also just makes sense. The preamble is, among other things, meant for synchronisation. In the first part of the preamble, you haven't synchronised yet, so you can't output it because it is still gibberish in some sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote two new versions of the preamble stripper that allow for a variable-length preamble in bytes. I believe it is reasonable to assume that the SFD is byte-aligned.

The first version just forwards fragments upon encountering the SFD:

data PreambleStripperState
  = ValidateSfd
  | Forward
  deriving (Generic, NFDataX, Show, ShowX)

preambleStripperC ::
  forall dom.
  HiddenClockResetEnable dom =>
  Circuit (PacketStream dom 1 ()) (PacketStream dom 1 ())
preambleStripperC = fromSignals (mealyB go ValidateSfd)
 where
  go ValidateSfd (Just PacketStreamM2S{..}, _) =
    (nextSt, (PacketStreamS2M True, Nothing))
   where
    nextSt
      | isNothing _last && head _data == 0xD5 = Forward
      | otherwise = ValidateSfd
  go Forward (Just transferIn, bwdIn) = (nextSt, (bwdIn, Just transferIn))
   where
    nextSt
      | isJust (_last transferIn) && _ready bwdIn = ValidateSfd
      | otherwise = Forward
  go st (Nothing, _) = (st, (PacketStreamS2M True, Nothing))

This version happily accepts packets with a longer preamble than normal just like the procedure described in the comment above. And prays that faulty packets are picked up by the CRC check.

Or we can go with a version that's a little stricter (but also slightly more expensive):

data PreambleStripperState
  = ValidateSfd {_counter :: Index 8}
  | Forward
  | Drop
  deriving (Generic, NFDataX, Show, ShowX)

preambleStripperC ::
  forall dom.
  HiddenClockResetEnable dom =>
  Circuit (PacketStream dom 1 ()) (PacketStream dom 1 ())
preambleStripperC = fromSignals (mealyB go (ValidateSfd 0))
 where
  go ValidateSfd{..} (Just PacketStreamM2S{..}, _) =
    (nextSt, (PacketStreamS2M True, Nothing))
   where
    nextSt
      | isNothing _last && head _data == 0xD5 = Forward
      | isNothing _last && _counter == maxBound = Drop
      | otherwise = ValidateSfd (satSucc SatWrap _counter)
  go Forward (Just transferIn, bwdIn) = (nextSt, (bwdIn, Just transferIn))
   where
    nextSt
      | isJust (_last transferIn) && _ready bwdIn = ValidateSfd 0
      | otherwise = Forward
  go Drop (Just transferIn, _) = (nextSt, (PacketStreamS2M True, Nothing))
   where
    nextSt
     | isJust (_last transferIn) = ValidateSfd 0
     | otherwise = Drop
  go st (Nothing, _) = (st, (PacketStreamS2M True, Nothing))

Which drops packets if the SFD was not detected in the first 8 bytes.

I personally prefer the latter, what do you guys think?

Copy link
Member

@DigitalBrains1 DigitalBrains1 Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel more for the former. I don't feel the extra circuitry adds a useful enough feature.

Also, in the latter, it seems to me you can replace satSucc SatWrap _counter by just succ _counter. It always starts at 0 and never progresses beyond maxBound.

[edit] I would not oppose the latter if that's what you want anyway [/edit]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you sensibly handle a non byte-aligned SFD with a streaming protocol that only allows you to enable full bytes? We cannot say that some bits are invalid. This is not possible with AXI Stream either.

Copy link
Member

@DigitalBrains1 DigitalBrains1 Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, we might also commit a version that requires byte alignment now, and create an issue noting this requirement and asking anyone who discovers they have reception issues to please inform us with details about their setup.

I'm not convinced this is a good solution, but personally I'm willing to accept this. I can't speak for other people.

[edit]
To clarify, when I posted this reply, GitHub did not show me the previous message, so I didn't ignore it, I was unaware of it.
[/edit]

Copy link
Member

@rowanG077 rowanG077 Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look what our forefathers did (liteeth). And what they do two things:

Also by googling I did find a phy that could return non-byte aligned shortened preamble + SFD: https://e2e.ti.com/support/interface-group/interface/f/interface-forum/1233993/dp83848c-preamble-sfd-length-is-shortage. Personally I think this should be made a phy specific problem. It's trivial to detect an SFD when you know you get n-bits every cycle. But hard when you have already adapted it to a byte packetstream with completely unknown alignment.

Personally considering the state currently I would keep the SFD check byte aligned for now. We only have an RGMII phy anyway. Make a note that explains the above story and once the next Phy is actually implemented we fix the story for that phy.

Point of note is that I did run the older colorlight board RGMII Phy in 100mbit mode using liteeth in the past without issues. That means the SFD was byte aligned. The 8.2v rev of the PCB has a phy that does not support 100mbit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you sensibly handle a non byte-aligned SFD with a streaming protocol that only allows you to enable full bytes?

I did not know about which place in the stack you were talking about. MII is a nibble-width protocol, and it's somewhat of a common unit in Ethernet. I thought you meant whether the nibbles of the SFD were byte-aligned, i.e., that the first nibble is always at an even number of cycles from the first where RX_DV is asserted (0 cycles, i.e., first nibble is the SFD, or 2, or ...).

It's a pity I misunderstood you. I spent a fair amount of time wading through 802.3 to check your assumption; turns out my assumption was the faulty one.

If I understand Rowan correctly, I agree that it should be byte-aligned before you turn it into a PacketStream. It doesn't seem to make sense to not do this.

Doesn't RGMII strictly define byte alignment? I.e., they either say the lower nibble is on the falling clock edge and the upper nibble is on the rising edge, or they say the exact reverse but still fully specify it? I'm not going to dive into another standard right now...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand Rowan correctly, I agree that it should be byte-aligned before you turn it into a PacketStream. It doesn't seem to make sense to not do this.

Yes this is essentially what I meant. If you wanted a generic component you'd want something like:

sfdDetector :: Signal (Maybe (BitVector n)) -> Signal (Maybe (BitVector n))

Which eats the input until a SFD is detected and then forwards. Similar to the original component except not yet in a byte aligned packet stream. And you also don't need to care about backpressure at this stage.

Doesn't RGMII strictly define byte alignment? I.e., they either say the lower nibble is on the falling clock edge and the upper nibble is on the rising edge, or they say the exact reverse but still fully specify it? I'm not going to dive into another standard right now...

Yes RGMII is a nice case. You get a full byte every cycle. I just checked and it specifies:

Multiplexing of data and control information is done by taking advantage of both edges of the reference clocks and sending the lower 4 bits on the rising edge and the upper 4 bits on the falling edge. Control signals can be multiplexed into a single clock cycle using the same technique.

and

This interface can be used to implement the 10/100 Mbps Ethernet Media Independent Interface (MII) by reducing the clock rate to 25MHz for 100Mbps operation and 2.5MHz for 10Mbps. The TXC will always be generated by the MAC and RXC will be generated by the PHY. During packet reception, the RXC may be stretched on either the positive or negative pulse to accommodate the transition from the free running clock to a data-synchronous clock domain. When the speed of the PHY changes, a similar stretching
of the positive or negative pulses is allowed. No glitching of the clocks are allowed during speed transitions.
This interface will operate at 10 and 100Mbps speeds exactly the same way it does at Gigabit speed with the exception that the data may be duplicated on the falling edge of the appropriate clock. The MAC will hold TX_CTL low until it has ensured that it is operating at the same speed as the PHY.

So if I interpret it correctly even in 10/100 Mbps mode it will always full transmit bytes.

test/Test/Cores/Ethernet.hs Outdated Show resolved Hide resolved
It was only used to be able to test the ARP stack in hardware when we did not have IP yet.
Major documentation was missing for the ICMP module. Especially the fact that the checksum adjustment breaks for very specific (unlikely to happen in practice) input packets.
Mostly includes improvements in packet generation, and reuse of models and generators. Aside from that, most test files have also been formatted with fourmolu.
Also removed some superfluous constraints and made the timer of the ARP manager less granular. This caused the manual test to fail, because its domain is too slow. The manual test was not any better than a hardware test anyway, so I removed it. Once support for ReqResp tests arrives from clash-protocols, we can add a proper test.
@t-wallet
Copy link
Collaborator Author

ARP is all done and open for further review. Only thing missing is aborting upon receiving backpressure from the ARP receiver. The problem with that is that it does not run at full throughput currently, because that saves resources. I could generalize abortOnBackpressureC to only abort after seeing n cycles of backpressure in order to fix that.

t-wallet added a commit that referenced this pull request Sep 27, 2024
Co-authored-by: Rowan Goemans <goemansrowan@gmail.com>

See the discussion in #8 (comment) for more details.
See the discussion in #8 (comment) for more details.

Co-authored-by: Rowan Goemans <goemansrowan@gmail.com>
@t-wallet
Copy link
Collaborator Author

Split IPv4 checksum improvements to #14.

Only the padding inserter and frame check sequence inserter needed adjustments. The tests are adjusted to use the new packet generation method.
PacketStream was recently updated to not require null bytes to be set
to 0x00 anymore. The padding inserter and fcs validator require this,
so they needed a small change. Many other components are now more
efficient because of this relaxation, though.

Relevant commit: clash-lang/clash-protocols@2e9557d
There is now a separate type for subnet masks to avoid confusion. Also
added several utility functions for address classification.
Instead of multiplexing the input we can leverage clock enables to save
resources and improve timing. Also for some reason, internetChecksum did
not reuse onesComplementAdd.
Linouth pushed a commit that referenced this pull request Nov 22, 2024
See the discussion in #8 (comment) for more details.

Co-authored-by: Rowan Goemans <goemansrowan@gmail.com>
Linouth pushed a commit that referenced this pull request Dec 5, 2024
See the discussion in #8 (comment) for more details.

Co-authored-by: Rowan Goemans <goemansrowan@gmail.com>
Linouth pushed a commit that referenced this pull request Dec 19, 2024
See the discussion in #8 (comment) for more details.

Co-authored-by: Rowan Goemans <goemansrowan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants