Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AXI4 to Tilelink converter at RoCC Module #1187

Closed
3 tasks done
leduchuybk opened this issue Jun 20, 2022 · 20 comments
Closed
3 tasks done

AXI4 to Tilelink converter at RoCC Module #1187

leduchuybk opened this issue Jun 20, 2022 · 20 comments
Labels

Comments

@leduchuybk
Copy link

Background Work

Chipyard Version and Hash

Release: 1.5.0
Hash: a6a6a6

OS Setup

Ex: Output of uname -a and lsb_release -a
LSB Version: core-9.20170808ubuntu1-noarch:printing-9.20170808ubuntu1-noarch:security-9.20170808ubuntu1-noarch
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic

Other Setup

Ex: Prior steps taken / Documentation Followed / etc...

Current Behavior

I want to create an converter to my AXI4 interface of my RoCC module to connect with Crossbar L2 memory.
You can see whole implementation as below.
AXI4 is converted and transferred into node atlNode of RoCC interfaces to connect with crossbar L2 memory of the system.
[my Module] w. AXI4 interface of RoCC module <-> AXI4ToTL node <-> atlNode (Tilelink node) of crossbar L2 memory.
design
The default RoCC interface signals may be classified into the following groups of signals

  • Core control (CC): for co-ordination between an accelerator and Rocket core
  • Register mode (Core): for exchange of data between an accelerator and Rocket core
  • Memory mode (Mem): for communication between an accelerator and L1-D Cache

The extended RoCC interface can provide for the following groups of signals

  • Uncached Tile Link (UTL): for communication between an accelerator & L2 memory
  • Floating Point Unit (FPU): for an accelerator to send and receive data from an FPU
  • Control Status Register (CSR): used by Linux on the core to recognize the accelerator
  • Page Table Walker (PTW): for address translation from an accelerator

However, it keeps printing error saying that "The following node was incorrectly connected as a source to TOP.memoryTap"

[error] java.lang.IllegalArgumentException: requirement failed: Diplomacy has detected a problem in your code:
[error] The following node was incorrectly connected as a source to TOP.memoryTap after its .module was evaluated at  at myRoCC.scala:28:19.
[error] source acc.atlNode node:
[error] parents: acc/tile/tile_reset_domain/tile_prci_domain/system/chiptop
[error] locator:  (myRoCC.scala:200:27)
[error]
[error] 1 outward nodes bound: [star-tlMasterXbar.node]

mRocketConfig.log

package freechips.rocketchip.tile

import chisel3._
//import chisel3.util._
import freechips.rocketchip.config._
import freechips.rocketchip.diplomacy._
import freechips.rocketchip.tilelink._
import freechips.rocketchip.amba.axi4._

class mAcceleratorModuleImp(outer: mAccelerator)(implicit p: Parameters)
    extends LazyRoCCModuleImp(outer) {
        val mTop = Module(new mAcceleratorTop)

        mTop.io.rocc_rs1   := io.cmd.bits.rs1
        mTop.io.rocc_rs2   := io.cmd.bits.rs2
        mTop.io.rocc_valid := io.cmd.valid
        io.busy            := mTop.io.rocc_busy
        io.cmd.ready        := mTop.io.rocc_ready

        val memAXI4Node = AXI4MasterNode(Seq(AXI4MasterPortParameters(
            masters = Seq(AXI4MasterParameters(
                name = "rocc_maxi4",
            ))
          ))
        )

        val memoryTap = TLIdentityNode()
        memoryTap := outer.atlNode
        (memoryTap
          := AXI4ToTL()
          := AXI4UserYanker(Some(2))
          := AXI4Fragmenter()
          := memAXI4Node)

        memAXI4Node.out foreach { case (out, edgeOut) =>
            mTop.io.m_axi_aw_ready     := out.aw.ready
            out.aw.valid       := mTop.io.m_axi_aw_valid
            out.aw.bits.id     := mTop.io.m_axi_aw_bits_id
            out.aw.bits.addr   := mTop.io.m_axi_aw_bits_addr
            out.aw.bits.len    := mTop.io.m_axi_aw_bits_len
            out.aw.bits.size   := mTop.io.m_axi_aw_bits_size
            out.aw.bits.burst  := mTop.io.m_axi_aw_bits_burst
            out.aw.bits.lock   := mTop.io.m_axi_aw_bits_lock
            out.aw.bits.cache  := mTop.io.m_axi_aw_bits_cache
            out.aw.bits.prot   := mTop.io.m_axi_aw_bits_prot
            out.aw.bits.qos    := mTop.io.m_axi_aw_bits_qos
            // unused signals
            assert(mTop.io.m_axi_aw_bits_region === 0.U)
            assert(mTop.io.m_axi_aw_bits_atop === 0.U)
            assert(mTop.io.m_axi_aw_bits_user === 0.U)

            mTop.io.m_axi_w_ready     := out.w.ready
            out.w.valid       := mTop.io.m_axi_w_valid
            out.w.bits.data   := mTop.io.m_axi_w_bits_data
            out.w.bits.strb   := mTop.io.m_axi_w_bits_strb
            out.w.bits.last   := mTop.io.m_axi_w_bits_last
            // unused signals
            assert(mTop.io.m_axi_w_bits_user === 0.U)

            out.b.ready       := mTop.io.m_axi_b_ready
            mTop.io.m_axi_b_valid     := out.b.valid
            mTop.io.m_axi_b_bits_id   := out.b.bits.id
            mTop.io.m_axi_b_bits_resp := out.b.bits.resp
            mTop.io.m_axi_b_bits_user := 0.U // unused

            mTop.io.m_axi_ar_ready    := out.ar.ready
            out.ar.valid      := mTop.io.m_axi_ar_valid
            out.ar.bits.id    := mTop.io.m_axi_ar_bits_id
            out.ar.bits.addr  := mTop.io.m_axi_ar_bits_addr
            out.ar.bits.len   := mTop.io.m_axi_ar_bits_len
            out.ar.bits.size  := mTop.io.m_axi_ar_bits_size
            out.ar.bits.burst := mTop.io.m_axi_ar_bits_burst
            out.ar.bits.lock  := mTop.io.m_axi_ar_bits_lock
            out.ar.bits.cache := mTop.io.m_axi_ar_bits_cache
            out.ar.bits.prot  := mTop.io.m_axi_ar_bits_prot
            out.ar.bits.qos   := mTop.io.m_axi_ar_bits_qos
            // unused signals
            assert(mTop.io.m_axi_ar_bits_region === 0.U)
            assert(mTop.io.m_axi_ar_bits_user === 0.U)

            out.r.ready       := mTop.io.m_axi_r_ready
            mTop.io.m_axi_r_valid     := out.r.valid
            mTop.io.m_axi_r_bits_id   := out.r.bits.id
            mTop.io.m_axi_r_bits_data := out.r.bits.data
            mTop.io.m_axi_r_bits_resp := out.r.bits.resp
            mTop.io.m_axi_r_bits_last := out.r.bits.last
            mTop.io.m_axi_r_bits_user := 0.U // unused
  }
}

class mAcceleratorTop extends Module{
    val io = IO (new Bundle{
        val rocc_ready = Output(Bool())
        val rocc_rs1   = Input(UInt(64.W))
        val rocc_rs2   = Input(UInt(64.W))
        val rocc_valid = Input(Bool())
        val rocc_busy  = Output(Bool())

        val m_axi_aw_ready      = Input(Bool())
        val m_axi_aw_valid       = Output(Bool())
        val m_axi_aw_bits_id     = Output(UInt(7.W))
        val m_axi_aw_bits_addr   = Output(UInt(32.W))
        val m_axi_aw_bits_len    = Output(UInt(8.W))
        val m_axi_aw_bits_size   = Output(UInt(3.W))
        val m_axi_aw_bits_burst  = Output(UInt(2.W))
        val m_axi_aw_bits_lock   = Output(Bool())
        val m_axi_aw_bits_cache  = Output(UInt(4.W))
        val m_axi_aw_bits_prot   = Output(UInt(3.W))
        val m_axi_aw_bits_qos    = Output(UInt(4.W))
        val m_axi_aw_bits_region = Output(UInt(4.W))
        val m_axi_aw_bits_atop   = Output(UInt(6.W))
        val m_axi_aw_bits_user   = Output(UInt(7.W))

        val m_axi_w_ready    = Input(Bool())
        val m_axi_w_valid     = Output(Bool())
        val m_axi_w_bits_data = Output(UInt(64.W))
        val m_axi_w_bits_strb = Output(UInt((64/8).W))
        val m_axi_w_bits_last = Output(Bool())
        val m_axi_w_bits_user = Output(UInt(7.W))

        val m_axi_ar_ready      = Input(Bool())
        val m_axi_ar_valid       = Output(Bool())
        val m_axi_ar_bits_id     = Output(UInt(7.W))
        val m_axi_ar_bits_addr   = Output(UInt(32.W))
        val m_axi_ar_bits_len    = Output(UInt(8.W))
        val m_axi_ar_bits_size   = Output(UInt(3.W))
        val m_axi_ar_bits_burst  = Output(UInt(2.W))
        val m_axi_ar_bits_lock   = Output(Bool())
        val m_axi_ar_bits_cache  = Output(UInt(4.W))
        val m_axi_ar_bits_prot   = Output(UInt(3.W))
        val m_axi_ar_bits_qos    = Output(UInt(4.W))
        val m_axi_ar_bits_region = Output(UInt(4.W))
        val m_axi_ar_bits_user   = Output(UInt(7.W))

        val m_axi_b_ready      = Output(Bool())
        val m_axi_b_valid     = Input(Bool())
        val m_axi_b_bits_id   = Input(UInt(7.W))
        val m_axi_b_bits_resp = Input(UInt(2.W))
        val m_axi_b_bits_user = Input(UInt(7.W))

        val m_axi_r_ready     = Output(Bool())
        val m_axi_r_valid     = Input(Bool())
        val m_axi_r_bits_id   = Input(UInt(7.W))
        val m_axi_r_bits_data = Input(UInt(64.W))
        val m_axi_r_bits_resp = Input(UInt(2.W))
        val m_axi_r_bits_user = Input(UInt(7.W))
        val m_axi_r_bits_last = Input(Bool())
    })
    io.rocc_busy  := !io.rocc_valid
    io.rocc_ready := io.rocc_valid

    io.m_axi_aw_valid       := false.B
    io.m_axi_aw_bits_id     := "h0".U
    io.m_axi_aw_bits_addr   := "h0".U
    io.m_axi_aw_bits_len    := "h0".U
    io.m_axi_aw_bits_size   := "h0".U
    io.m_axi_aw_bits_burst  := "h0".U
    io.m_axi_aw_bits_lock   := "h0".U
    io.m_axi_aw_bits_cache  := "h0".U
    io.m_axi_aw_bits_prot   := "h0".U
    io.m_axi_aw_bits_qos    := "h0".U
    io.m_axi_aw_bits_region := "h0".U
    io.m_axi_aw_bits_atop   := "h0".U
    io.m_axi_aw_bits_user   := "h0".U

    io.m_axi_w_valid        := false.B
    io.m_axi_w_bits_data    := "h0".U
    io.m_axi_w_bits_strb    := "h0".U
    io.m_axi_w_bits_last    := false.B
    io.m_axi_w_bits_user    := "h0".U

    io.m_axi_ar_valid       := false.B
    io.m_axi_ar_bits_id     := "h0".U
    io.m_axi_ar_bits_addr   := "h0".U
    io.m_axi_ar_bits_len    := "h0".U
    io.m_axi_ar_bits_size   := "h0".U
    io.m_axi_ar_bits_burst  := "h0".U
    io.m_axi_ar_bits_lock   := false.B
    io.m_axi_ar_bits_cache  := "h0".U
    io.m_axi_ar_bits_prot   := "h0".U
    io.m_axi_ar_bits_qos    := "h0".U
    io.m_axi_ar_bits_region := "h0".U
    io.m_axi_ar_bits_user   := "h0".U

    io.m_axi_b_ready        := false.B

    io.m_axi_r_ready        := false.B

}

class  mAccelerator(opcodes: OpcodeSet)(implicit p: Parameters) extends LazyRoCC(opcodes) {
  override lazy val module = new mAcceleratorModuleImp(this)
  override val atlNode = TLClientNode(Seq(TLMasterPortParameters.v1(Seq(TLMasterParameters.v1("rocc_atl")))))
}

class WithmAccelerator extends Config ((site, here, up) =>
{
  case BuildRoCC => Seq(
    (p: Parameters) => {
      val acc = LazyModule(new mAccelerator(OpcodeSet.custom0)(p))
      acc
  })
})

Expected Behavior

How to connect custom accelerator module with AXI4 interface with RoCC interface atlNode

Other Information

No response

@leduchuybk leduchuybk added the bug label Jun 20, 2022
@jerryz123
Copy link
Contributor

Try
outer.atlNode := memoryTapNode

@leduchuybk
Copy link
Author

Thank you @jerryz123 for comment.
I tried your solution. but it still got error as follow
[error] java.lang.IllegalArgumentException: requirement failed: Diplomacy has detected a problem in your code:
[error] The following node was incorrectly connected as a sink to mTop.memoryTap after its .module was evaluated ....

@michael-etzkorn
Copy link
Contributor

michael-etzkorn commented Jun 23, 2022


        val memAXI4Node = AXI4MasterNode(Seq(AXI4MasterPortParameters(
            masters = Seq(AXI4MasterParameters(
                name = "rocc_maxi4",
            ))
          ))
        )

        val memoryTap = TLIdentityNode()
        memoryTap := outer.atlNode
        (memoryTap
          := AXI4ToTL()
          := AXI4UserYanker(Some(2))
          := AXI4Fragmenter()
          := memAXI4Node)

Two things:

  • The code snippet above should all be in mAccelerator and not in the LazyModuleImp since these are diplomatic connections.

AFAIK, Diplomacy won't work like you expect it to in the LazyModuleImp.

  • The flow here doesn't make sense.

You're connecting a source to another source. If you wanna use your source, simply ignore the TLClientNode. If that source points to some sink outside of this (probably system xbar? I haven't looked at LazyRoCC's code yet), you need to point your source to that as well (assuming it's a xbar or add a xbar if necessary).

@michael-etzkorn
Copy link
Contributor

michael-etzkorn commented Jun 23, 2022

LazyRoCC code snippet:
https://github.com/chipsalliance/rocket-chip/blob/114325b27cfe5312c86a8a325b187be9455a62af/src/main/scala/tile/LazyRoCC.scala#L57-L81

abstract class LazyRoCC(
      val opcodes: OpcodeSet,
      val nPTWPorts: Int = 0,
      val usesFPU: Boolean = false
    )(implicit p: Parameters) extends LazyModule {
  val module: LazyRoCCModuleImp
  val atlNode: TLNode = TLIdentityNode()
  val tlNode: TLNode = TLIdentityNode()
}

class LazyRoCCModuleImp(outer: LazyRoCC) extends LazyModuleImp(outer) {
  val io = IO(new RoCCIO(outer.nPTWPorts))
}

/** Mixins for including RoCC **/

trait HasLazyRoCC extends CanHavePTW { this: BaseTile =>
  val roccs = p(BuildRoCC).map(_(p))

  roccs.map(_.atlNode).foreach { atl => tlMasterXbar.node :=* atl }
  roccs.map(_.tlNode).foreach { tl => tlOtherMastersNode :=* tl }

  nPTWPorts += roccs.map(_.nPTWPorts).sum
  nDCachePorts += roccs.size
}

It looks like the LazyRocc atlnode is already an identity node. Don't override that with a ClientNode! Just replace memoryTap with atlNode.

Probably to get diplomacy to work as intended, you'll want to move those diplomatic connections to the LazyModule.

@leduchuybk
Copy link
Author

Thank @michael-etzkorn for comment.

You pointed out two things from my code.
First, you said that conversion code from axi4 to tilelink should not be in mAcceleratorModuleImp but should be in mAccelerator because those lines of code are diplomatic. However, after tried to move it outside as below:

class mAccelerator(opcodes: OpcodeSet) (implicit p: Parameters) extends LazyRoCC(opcodes)
{
override lazy val module = Module(new mAcceleratorModuleImp(this))
(atlNode := AXI4ToTL() := AXI4UserYanker(Some(2)) := AXI4Fragmenter() := module.memAXI4Node)
}

I got error as below:

java.lang.IllegalArgumentException: requirement failed: mAccelerator.module was contructed before LazyModule() was run on mAcceleartor

Secondly, i understood your explaination about two sources connecting to each other. Initial code was wrong. My intention is to connect source and sink as below. Source from mAccelerator is memAXI4Node. Sink i want to connect is system xbar.

i would be very appreciated if you could try these code file or write down some samples code so that i can try on my own.

@michael-etzkorn
Copy link
Contributor

michael-etzkorn commented Jun 24, 2022

So I might be off here, but intuition tells me you're seeing that error because the diplomatic connections have to be above the declaration of the module. Move the master node declaration outside of the implementation as well.

        val memAXI4Node = AXI4MasterNode(Seq(AXI4MasterPortParameters(
            masters = Seq(AXI4MasterParameters(
                name = "rocc_maxi4",
            ))
          ))
        )

should be in the outer module. Reference it from within the LazyModuleImp as outer.memAXI4Node

Also, there should also be no reason to override module because it's an abstract member of LazyRoCC.

class mAccelerator(opcodes: OpcodeSet) (implicit p: Parameters) extends LazyRoCC(opcodes)
{
        val memAXI4Node = AXI4MasterNode(Seq(AXI4MasterPortParameters(
            masters = Seq(AXI4MasterParameters(
                name = "rocc_maxi4",
            ))
          ))
        )
(atlNode := AXI4ToTL() := AXI4UserYanker(Some(2)) := AXI4Fragmenter() := memAXI4Node)
lazy val module = Module(new mAcceleratorModuleImp(this))
}

Then you could replace the declaration in ModuleImp with val memAXI4Node = outer.memAXI4Node.

Your config is the first time I've seen that approach with a key so I don't know if that works, but since Jerry didn't comment on it, I assume that's how BuildRoCC key is set? After fixing up the LazyModule outer diplomatic connections to not reference module and instead have the module reference outer, that's the next place I'd look. I'd love to debug for you (time permitting), but trust me when I say taking the time to work these bugs out for yourself will improve your ability to work with diplomatic connections immensely 😄

@leduchuybk
Copy link
Author

@michael-etzkorn
Thank a bundle
You r right. working with this bug do help improve my understanding with diplomatic connections.
Other point, the reason I override module is because in example code of LazyRoCC.scala they did it.
For anyone finding similar problem, my code is as follow and it works

class mAccelerator(opcodes: OpcodeSet) (implicit p: Parameters) extends LazyRoCC(opcodes)
{
       override lazy val module = new mAcceleratorModuleImp(this)
       val memAXI4Node = AXI4MasterNode(Seq(AXI4MasterPortParameters(
            masters = Seq(AXI4MasterParameters(
                name = "rocc_maxi4",
            ))
          ))
       )
       (atlNode := AXI4ToTL() := AXI4UserYanker(Some(2)) := AXI4Fragmenter() := module.memAXI4Node)
}

@michael-etzkorn
Copy link
Contributor

michael-etzkorn commented Jun 26, 2022

I'm still not quite sure how I feel about referring to a node within the LazyModuleImp from LazyModule -- the module.memAXI4node in your final line of code there.

The opposite is usually done i.e. within LazyModuleImp connect to the hardware to the node memAXI4Node using outer. But if it works, it works.

As a general rule of thumb

Diplomatic parameters and pure software constructs go in the LazyModule (such as the diplomatic nodes) and the LazyModuleImp instantiates the actual hardware.

The override was probably to turn the identity node into a TLClient for that use case, but here you can just use the identity node. I like to think of Identity Nodes as a sort of way point node between a source and a sink. Perfect for what you were trying to do which is connect an AXI source to some TL Node that can be connected to memory by a Tile.

Glad the design's elaborating! Best of luck with the rest of your project!

@leduchuybk
Copy link
Author

@michael-etzkorn thank you for your help.
I was able to generate this design's bitstream. however, the rocket design using L2 cache has terrible timing fail. therefore, i wanted to change my axi4 connection directly to L1 through mem (LazyRoCC.scala: line 44).
Will the conversion between axi4 node and mem be the same as with atlNode?

@michael-etzkorn
Copy link
Contributor

michael-etzkorn commented Jul 16, 2022

I don't believe mem here is a diplomatic node. The IO connections and glue logic would look different.

  • If the L2 cache is giving you timing trouble, the best thing to do for timing is to remove it which can be done by adding WithNBanks(0) to your config.

  • If you need to keep the coherence, you can try WithBroadcastManager instead.

@leduchuybk
Copy link
Author

leduchuybk commented Jul 19, 2022

@michael-etzkorn

  • Unfortunately, my Rocc module uses atlNode to connect between my module and memory (in this case, tlMasterXbar). When adding 'WithNBanks(0)' into config, it creates following error:

[error] java.util.NoSuchElementException: key not found: Location(system_mbus)

  • about second option, coherence is still a new term to me. where i can find about example or docs about this one. i didnt find any info in chipyard tutorial.

thank you

@michael-etzkorn
Copy link
Contributor

michael-etzkorn commented Jul 19, 2022

Yeah, I was looking at this for a separate issue. There needs to be a better error message here. I can help look into that soon enough.

image

Coherence is a cache concept for ensuring clients accessing separate caches can still access the most recent data by whatever memory ordering model is being used. The caches are said to be coherent if they both reflect the most recent1 write transaction.

https://en.wikipedia.org/wiki/Cache_coherence

Using the Broadcast Manager should be more lightweight than the L2 Inclusive Cache.

Footnotes

  1. recent here sort of depends on the memory order model used. I believe RISC-V is relaxed, but without nuance, you can think of it as the last write transaction among clients to the same cached address.

@leduchuybk
Copy link
Author

My config is as follow

class myConfig extends Config(
    new chipyard.WithMulticlockIncoherentBusTopology ++ // use incoherent bus
    new freechips.rocketchip.subsystem.WithNbanks(0) ++ // remove L2
    new freechips.rocketchip.tile.WithmyRoCC ++               // my RoCC
    new freechips.rocketchip.subsystem.WithNMedCores(1) ++ // Single median rocket core
    new chipyard.config.AbstractConfig) 

Error is as follow:

[error] java.util.NoSuchElementException: key not found: Location(subsystem mbus)
[error] at freechips.rocketchip.util.LocationMap.default(Location.scala:21)
[error] at scala.collection.MapLike.apply(MapLike.scala:144)
[error] at scala.collection.MapLike.apply$(MapLike.scala:143)
[error] at freechips.rocketchip.util.LocationMap.apply(Location.scala:21)
[error] at freechips.rocketchip.subsystem.HasTileLinkLocations.locateTLBushrapper(Attachable.scala:39)
[error] at freechips.rocketchip.subsystem.HasTileLinkLocations.locateTLBushrapper$(Attachable.scala:39)
[error] at freechips.rocketchip.subsystem.BaseSubsystem.locateTLBushrapper(BaseSubsystem.scala:72)
[error] at freechips.rocketchip.subsystem.Attachable.locateTLBushrapper(Attachable.scala:58)
[error] at freechips.rocketchip.subsystem.Attachable.locateTLBushrapper$(Attachable.scala:58)
[error] at freechips.rocketchip.subsystem.BaseSubsystem.locateTLBusWrapper(BaseSubsystem.scala:72)
[error] at testchipip.CanHavePeripheryTLSerial.$anonfun$x$6$1(SerialAdapter.scala:339)
[error] at testchipip.CanHavePeripheryTLSerial$$Lambda$8378/562780479.apply(UnknownSource)
[error] at scala.Option.map(Option.scala:230)
[error] at testchipip.CanHavePeripheryTLSerial.(SerialAdapter.scala:336)
[error] at chipyard.DigitalTop.(DigitalTop.scala:15)
[error] at chipyard.BuildSystem$$lessinit$greater$1.apply(ChipTop.scala:16)
[error] at chipyard.BuildSystem$$lessinit$greater$1.apply(ChipTop.scala:16)
[error] at chipyard.ChipTop.lazySystem$lzycompute(ChipTop.scala:32)

@leduchuybk
Copy link
Author

Yeah, I was looking at this for a separate issue. There needs to be a better error message here. I can help look into that soon enough.

image

Coherence is a cache concept for ensuring clients accessing separate caches can still access the most recent data by whatever memory ordering model is being used. The caches are said to be coherent if they both reflect the most recent1 write transaction.

https://en.wikipedia.org/wiki/Cache_coherence

Using the Broadcast Manager should be more lightweight than the L2 Inclusive Cache.

Footnotes

  1. recent here sort of depends on the memory order model used. I believe RISC-V is relaxed, but without nuance, you can think of it as the last write transaction among clients to the same cached address.

From your comment, i can imagine that core is client 1 and my RoCC module is client 2. I do not think my design needs coherent.

Thank you

@michael-etzkorn
Copy link
Contributor

michael-etzkorn commented Jul 20, 2022

Is the Broadcast Manager with your RoCC also facing timing issues? I haven't had a chance to look into how to remove the L2 Cache more than NoNBanks(0) or replacing it WithBroadcastManager.

Hopefully @jerryz123 can weigh. It seems like this may involve chipsalliance/rocket-chip#2978 or I'm just missing something with taking out the coherence manager.

@jerryz123
Copy link
Contributor

Use WithBroadcastManager instead of WithNBanks and WithIncoherentBusTopology

@leduchuybk
Copy link
Author

With @jerryz123 's suggestion, i was able to run simulation with my config as follow:

class myConfig extends Config(
    new freechips.config.WithBroadcastManager ++                     // use broadcast 
    new freechips.rocketchip.tile.WithmyRoCC ++                        // my RoCC
    new freechips.rocketchip.subsystem.WithNMedCores(1) ++  // Single median rocket core
    new chipyard.config.AbstractConfig) 

However, when trying to insert this config into FPGA project, i got error like this #1169
You can see my configuration as follow:

class myRocketConfig extends Config (
    new WithArtyTweaks ++ // ChangeTestharness IO to match Arty board (please check file attached)
    new WithFPGABootROM ++ // my custom boot ROM
    new freechips.config.WithBroadcastManager ++                     // use broadcast 
    new freechips.rocketchip.tile.WithmyRoCC ++                        // my RoCC
    new freechips.rocketchip.subsystem.WithNMedCores(1) ++  // Single median rocket core
    new chipyard.fpga.arty_a7_100.AbstractConfig // AbstractConfig for FPGA board (please check file attached)
)

Configs.txt

@leduchuybk
Copy link
Author

from what @michael-etzkorn commented on #1169, i added signals prot and cache on memAXINode as follow

out.ar.bits.prot  := "h0".U
out.ar.bits.cache := "h0".U
out.aw.bits.prot  := "h0".U
out.aw.bits.cache := "h0".U

However, it still shows the same error relating to ddr memory connection at this class

class WithDDRMem extends OverrideHarnessBinder({
  (system: CanHaveMasterTLMemPort, th: BaseModule with HasHarnessSignalReferences, ports: Seq[HeterogeneousBag[TLBundle]]) => {
    th match { case artyth: Arty100TFPGATestHarnessImp => {
      require(ports.size == 1)

      val bundles = artyth.artyOuter.ddrClient.out.map(_._1)
      val ddrClientBundle = Wire(new HeterogeneousBag(bundles.map(_.cloneType)))
      bundles.zip(ddrClientBundle).foreach { case (bundle, io) => bundle <> io }
      ddrClientBundle <> ports.head
    } }
  }
})

@michael-etzkorn
Copy link
Contributor

It has to be done on the ddrClientBundle which is a TLBundle where the user AMBAProt fields aren't necessarily initialized.

@leduchuybk
Copy link
Author

@michael-etzkorn i added the following lines to class WithDDRMem and it worked

ddrClientBundle.head.a.bits.user.lift(AMBAProt).foreach { x =>
 x.priviledge := true.B
 x.secure := true.B
 x.fetch := false.B
 x.bufferable := true.B
 x.modifiable := true.B
 x.readalloc := true.B
 x.writealloc := true.B

However, i still wonder if the value is correct. My system using L1 cache only. therefore, i assumed that system needs "read and write allocate". according to table A4-5 Memory type encoding of AXI4 specification.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants