Skip to content

Latest commit

 

History

History
593 lines (463 loc) · 21.8 KB

toolchain-overview.md

File metadata and controls

593 lines (463 loc) · 21.8 KB

Toolchain Overview

RISC-V,作为一个较新的ISA,从未被完整实现过。因此想要在现有工具链上进行开发,读者有必要熟悉工具链中众多的workaround和undocumented features。

RISC-V Tools

官方的riscv-tools由以下部分组成

  • riscv-gnu-toolchain, a RISC-V cross-compiler
  • riscv-fesvr, a "front-end" server that services calls between the host and target processors on the Host-Target InterFace (HTIF) (it also provides a virtualized console and disk device)
  • riscv-isa-sim, the ISA simulator and "golden standard" of execution
  • riscv-opcodes, the enumeration of all RISC-V opcodes executable by the simulator
  • riscv-pk, a proxy kernel that services system calls generated by code built and linked with the RISC-V Newlib port (this does not apply to Linux, as it handles the system calls)
  • riscv-tests, a set of assembly tests and benchmarks

下面简要介绍这些工具主要的undocumented features,详细内容请参阅各个lab的移植文档。

Tool Features
riscv-gnu-toolchain 编译时使用-b binary选项导致ld发生segmentation fault
riscv-fesvr 不存在所谓的disk device
riscv-isa-sim rdtime指令未实现,运行速度2MHz
riscv-pk proxy kernel不支持32位程序,存在大量workaround

All About BBL

以下所有讨论只适用于Spike模拟器和QEMU上对应的spike-board实现,在其他RISC-V平台上未必正确。

Compilation

bbl使用了Autotools作为构建系统,编译过程如下

$ mkdir build && cd build
$ ../configure --prefix=$RISCV --host=riscv32-unknown-linux-gnu --with-payload=/path/to/kernel
$ make

若不传入--with-payload选项,则默认使用dummy_payload,读者应当查看bbl/payload.Sdummy_payload/以初步了解bbl加载kernel的原理。

payload.S如下

.section ".payload","a",@progbits
.align 3

.globl _payload_start, _payload_end
_payload_start:
.incbin BBL_PAYLOAD
_payload_end:

要注意.align 3并非3字节对齐而是$2^3$字节对齐。

Linker Script

bbl的linker script如下

OUTPUT_ARCH( "riscv" )

ENTRY( reset_vector )

SECTIONS
{
  /*--------------------------------------------------------------------*/
  /* Code and read-only segment                                         */
  /*--------------------------------------------------------------------*/

  /* Begining of code and text segment */
  . = 0x80000000;
  _ftext = .;
  PROVIDE( eprol = . );

  .text :
  {
    *(.text.init)
  }

  /* text: Program code section */
  .text : 
  {
    *(.text)
    *(.text.*)
    *(.gnu.linkonce.t.*)
  }

  /* rodata: Read-only data */
  .rodata : 
  {
    *(.rdata)
    *(.rodata)
    *(.rodata.*)
    *(.gnu.linkonce.r.*)
  }

  /* End of code and read-only segment */
  PROVIDE( etext = . );
  _etext = .;

  /*--------------------------------------------------------------------*/
  /* HTIF, isolated onto separate page                                  */
  /*--------------------------------------------------------------------*/
  . = ALIGN(0x1000);
  htif :
  {
    *(htif)
  }
  . = ALIGN(0x1000);

  /*--------------------------------------------------------------------*/
  /* Initialized data segment                                           */
  /*--------------------------------------------------------------------*/

  /* Start of initialized data segment */
  . = ALIGN(16);
   _fdata = .;

  /* data: Writable data */
  .data : 
  {
    *(.data)
    *(.data.*)
    *(.srodata*)
    *(.gnu.linkonce.d.*)
    *(.comment)
  }

  /* End of initialized data segment */
  . = ALIGN(4);
  PROVIDE( edata = . );
  _edata = .;

  /*--------------------------------------------------------------------*/
  /* Uninitialized data segment                                         */
  /*--------------------------------------------------------------------*/

  /* Start of uninitialized data segment */
  . = .;
  _fbss = .;

  /* sbss: Uninitialized writeable small data section */
  . = .;

  /* bss: Uninitialized writeable data section */
  . = .;
  _bss_start = .;
  .bss : 
  {
    *(.bss)
    *(.bss.*)
    *(.sbss*)
    *(.gnu.linkonce.b.*)
    *(COMMON)
  }

  .sbi :
  {
    *(.sbi)
  }

  .payload :
  {
    *(.payload)
  }

  _end = .;
}

CPU加电后执行0x00001000处的首条指令,通过 auipc跳转到0x80000000开始执行bbl的启动代码。可以看见bbl的入口为reset_vector,该符号位于machine/mentry.S中。Linker script中还需要注意的有htif.sbi.payload三个部分,它们分别位于machine/mtrap.csbi_entry.Sbbl/payload.S中。

Loading Kernel

void boot_loader()
{
  extern char _payload_start, _payload_end;
  load_kernel_elf(&_payload_start, &_payload_end - &_payload_start, &info);
  supervisor_vm_init();
#ifdef PK_ENABLE_LOGO
  print_logo();
#endif
  mb();
  elf_loaded = 1;
  enter_supervisor_mode((void *)info.entry, 0);
}

在完成编译后,我们的kernel以二进制ELF文件的形式被打包到了生成的bbl中,而kernel的起始和终止地址分别为_payload_start_payload_end,BBL会读取kernel并释放到内存中,读者可以参阅bbl/kernel_elf.c文件以了解详细过程;之后,BBL会利用从ELF中获得的信息为kernel建立一个基本的页表,并将SBI映射到虚拟地址空间的最后一个页上;最后,enter_supervisor_mode函数会将控制权转交给kernel并进入S-mode。

Supervisor Binary Interface

之前已经提到,RISC-V利用Binary Interface实现对底层环境的抽象,从而方便了各个水平的虚拟化的实现。这个想法本身是非常优秀的,可惜直到Privileged ISA Specification v1.9.1为止,SBI的实现思路都是错误的。为了方便说明,我们先对RISC-V ISA做进一步介绍。

Memory Management

对于一个32位Unix-like操作系统而言,只需要用两种内存管理管理模式

  • Mbare: Physical Addresses
  • Sv32: Page-Based 32-bit Virtual-Memory Systems

默认情况下使用的是Mbare模式,若想启用Sv32模式,需要向mstatus寄存器中的VM域写入00100,此时若进入S-mode,系统会自动使用页式寻址。要注意的有三点

  • M-mode下使用的始终是Mbare内存管理
  • mstatus是M-mode特有的寄存器,S-mode下的sstatus寄存器中无VM域,若读者对此处突然提到sstatus感到疑惑,建议阅读Privileged ISA Specification v1.9.1 3.1.6小节
  • 页表基址对应物理页的页号存放在spbtr寄存器中,该寄存器为S-mode特有寄存器,M-mode和S-mode下可写可读

如果读者还记得OOP课上学过的single responsibility principle,应该能意识到让M-mode的软件SEE决定是否启用页式寻址并让S-mode的软件OS管理页表是一件很糟糕的事情,而事实也确实是这样。

SBI Implementation

SBI呈现为一组函数,它的实现在sbi_entry.S中,OS只能获得头文件sbi.h和对应的函数地址sbi.S

#ifndef _ASM_RISCV_SBI_H
#define _ASM_RISCV_SBI_H

typedef struct {
  unsigned long base;
  unsigned long size;
  unsigned long node_id;
} memory_block_info;

unsigned long sbi_query_memory(unsigned long id, memory_block_info *p);

unsigned long sbi_hart_id(void);
unsigned long sbi_num_harts(void);
unsigned long sbi_timebase(void);
void sbi_set_timer(unsigned long long stime_value);
void sbi_send_ipi(unsigned long hart_id);
unsigned long sbi_clear_ipi(void);
void sbi_shutdown(void);

void sbi_console_putchar(unsigned char ch);
int sbi_console_getchar(void);

void sbi_remote_sfence_vm(unsigned long hart_mask_ptr, unsigned long asid);
void sbi_remote_sfence_vm_range(unsigned long hart_mask_ptr, unsigned long asid, unsigned long start, unsigned long size);
void sbi_remote_fence_i(unsigned long hart_mask_ptr);

unsigned long sbi_mask_interrupt(unsigned long which);
unsigned long sbi_unmask_interrupt(unsigned long which);

#endif
.globl sbi_hart_id; sbi_hart_id = -2048
.globl sbi_num_harts; sbi_num_harts = -2032
.globl sbi_query_memory; sbi_query_memory = -2016
.globl sbi_console_putchar; sbi_console_putchar = -2000
.globl sbi_console_getchar; sbi_console_getchar = -1984
.globl sbi_send_ipi; sbi_send_ipi = -1952
.globl sbi_clear_ipi; sbi_clear_ipi = -1936
.globl sbi_timebase; sbi_timebase = -1920
.globl sbi_shutdown; sbi_shutdown = -1904
.globl sbi_set_timer; sbi_set_timer = -1888
.globl sbi_mask_interrupt; sbi_mask_interrupt = -1872
.globl sbi_unmask_interrupt; sbi_unmask_interrupt = -1856
.globl sbi_remote_sfence_vm; sbi_remote_sfence_vm = -1840
.globl sbi_remote_sfence_vm_range; sbi_remote_sfence_vm_range = -1824
.globl sbi_remote_fence_i; sbi_remote_fence_i = -1808

上面sbi.S中的magic numbers就是各个函数所在的虚拟地址,为了将这些函数映射到这些位置上,BBL在加载kernel时做了一些额外的工作,之前在Loading Kernel部分也有提及,具体实现如下

  // map SBI at top of vaddr space
  extern char _sbi_end;
  uintptr_t num_sbi_pages = ((uintptr_t)&_sbi_end - DRAM_BASE - 1) / RISCV_PGSIZE + 1;
  assert(num_sbi_pages <= (1 << RISCV_PGLEVEL_BITS));
  for (uintptr_t i = 0; i < num_sbi_pages; i++) {
    uintptr_t idx = (1 << RISCV_PGLEVEL_BITS) - num_sbi_pages + i;
    sbi_pt[idx] = pte_create((DRAM_BASE / RISCV_PGSIZE) + i, PTE_G | PTE_R | PTE_X);
  }
  pte_t* sbi_pte = middle_pt + ((num_middle_pts << RISCV_PGLEVEL_BITS)-1);
  assert(!*sbi_pte);
  *sbi_pte = ptd_create((uintptr_t)sbi_pt >> RISCV_PGSHIFT);

有兴趣的读者可以自行理解实现细节。

SBI Pitfall

All problems in computer science can be solved by another level of indirection... Except for the problem of too many layers of indirection.

— David Wheeler

虽然SBI的实现复杂得无以复加,但到目前为止似乎还没出什么逻辑上的问题,果真如此吗?让我们来看一个例子

unsigned long sbi_query_memory(unsigned long id, memory_block_info *p);

这个SBI函数不可能被实现,因为它涉及到了传递地址的过程,而我们之前已经提到,M-mode永远工作在Mbare模式下。从kernel中传一个32位虚拟地址给SEE毫无意义,因为SEE只能看到物理地址。这样,我们发现了SBI的第一个问题

  • SBI只能传值而不能传引用

第二个问题并不如第一个显然。考虑一下,既然SBI是Supervisor对SEE进行“系统”调用的过程,期间必然会发生特权级从S到M的转换,RISC-V中只有一条指令能完成这种转换——ecall。我们不妨来看一看sbi_console_putchar的实现

# console_putchar
.align 4
li a7, MCALL_CONSOLE_PUTCHAR # MCALL_CONSOLE_PUTCHAR == 1
ecall
ret

所有的SBI都应该是如此实现的,但一个更合乎逻辑的Binary Interface应当是这样的——"欲使用SEE提供的console putchar功能,请将想要输出的字符放入寄存器a0,将寄存器a7置为1,并使用ecall指令"。如果上述理由不足以说服你,那么请看下面这个x86汇编程序

section .programFlow
    global _start
    _start:
        mov edx, len
        mov ecx, msg
        mov ebx, 0x1    ;select STDOUT stream
        mov eax, 0x4    ;select SYS_WRITE call
        int 0x80        ;invoke SYS_WRITE
        mov ebx, 0x0    ;select EXIT_CODE_0
        mov eax, 0x1    ;select SYS_EXIT call
        int 0x80        ;invoke SYS_EXIT
section .programData
    msg: db "Hello World!",0xa
    len: equ $ - msg

我们使用了Linux操作系统提供的ABI完成了打印"Hello World!"的任务,printfputchar等函数我们一般称之为API而非ABI。当下RISC-V中SBI的形态——一个头文件和一组函数地址——更加像是SPI而非SBI。这就是SBI存在的第二个问题

  • SBI过度封装

SBI in BBL

unsigned long sbi_query_memory(unsigned long id, memory_block_info *p);

前面已经说过,这个函数不可能被实现,可它确确实实在BBL中被“实现”了,读者可以参阅machine/sbi_entry.Smachine/sbi_impl.c

# query_memory
.align 4
tail __sbi_query_memory
uintptr_t __sbi_query_memory(uintptr_t id, memory_block_info *p)
{
  if (id == 0) {
    p->base = first_free_paddr;
    p->size = mem_size + DRAM_BASE - p->base;
    return 0;
  }

  return -1;
}

这个workaround似乎没有什么问题,但我们还是得仔细考量一下。tail __sbi_query_memory可以理解为一条jump到函数入口地址的指令,问题在于,上述代码都是在bbl中编译的,其中的地址均为物理地址,为何Supervisor能够正常调用它们呢?

原因大致有两点

  • 编译器生成了position-independent code
  • 在虚拟地址空间中,两段代码的相对位置关系和物理地址空间中的相对位置关系相同

由于上述原因,当操作系统完成对物理内存的管理后,这样的workaround也不再有效。

SBI in the future

SBI的众多问题有望在Privileged ISA Specification v1.10中得到解决,下面是我们和作者的通信

主 题:    
Re: I'm from Tsinghua University and have some questions about SBI in RISC-V.
发件人:    Andrew Waterman 2017-4-11 15:27:41
收件人:    张蔚
Great questions.

On Mon, Apr 10, 2017 at 8:18 PM, 张蔚 <zhangwei15@mails.tsinghua.edu.cn> wrote:
> Dear Dr. Waterman,
>
> My name is Wei Zhang and I'm an undergraduate at Tsinghua University. I'm
> working on porting our teaching operating system (ucore_os_lab) to RISC-V
> under the guidance of Prof. Chen and Prof. Xiang. And I'm confused with SBI
> in RISC-V.
>
> While investigating BBL, I realized that it's inherently difficult to pass
> reference to SBI functions since supervisor lives in virtual address space
> while SEE sees physical address space. Some SBI functions defined in
> privileged spec 1.9.1 involves passing and returning pointers, I suspect
> they can't work properly without manually doing a page walk in SEE.

Yes, this is an unfortunate complication.  We are revising the SBI for
the next version of the spec, 1.10, and have arrived at a simpler
design.  We eliminated some of the calls that pass pointers, in favor
of providing a device tree pointer upon OS boot.  It is a physical
address, but now the OS starts with address translation disabled, so
this works out fairly naturally.

The remaining calls that pass pointers (e.g. SEND_IPI) now use virtual
addresses.

>
> Another question is why SBI takes the form of a collection of virtual
> addresses. Calling a SBI function will transfer control to SEE, so there is
> supposed to be a ecall somewhere in that function. It might be more natural
> to directly tell OS-designers what they should put in each register before
> invoking ecall to get desired functionalities, so they can write a small
> library themselves to wrap things up easily. SBI entries in last page
> require extra effort for both OS-designers and SEE-writers.

Agreed.  The 1.10 design uses ECALL directly, rather than jumps to
virtual addresses.  The original approach was designed to optimize
paravirtualized guest OSes, but we decided the slight overhead in
those cases was worth the simplicity of avoiding the SBI page mapping.

>
> Could you please correct me if I have misunderstood SBI? And if above
> problems do exist, are there plans to solve them is the next privileged spec?
>
> Thank you for your help in this matter.
>
> Sincerely,
>
> Wei Zhang
>
>
>

Host-Target Interface

之前介绍工具链时已经提到了Host-Target Interface (HTIF),虽然对使用了bbl的OS开发者来说并无影响,但读者仍有必要熟悉这个重要的feature。让我们考虑一个字符是怎样被bbl输出到terminal中的。

Step 0: Declaring Magic Variables

首先,我们需要在源码中声明两个特殊变量tohostfromhost,读者可以查看machine/mtrap.c文件

volatile uint64_t tohost __attribute__((aligned(64))) __attribute__((section("htif")));
volatile uint64_t fromhost __attribute__((aligned(64))) __attribute__((section("htif")));

Step 1: Finding Magic Variables

riscv-fesvr在加载bbl时,会在ELF文件中搜索这两个变量,并记下它们的物理地址

std::map<std::string, uint64_t> symbols = load_elf(path.c_str(), &mem);

if (symbols.count("tohost") && symbols.count("fromhost")) {
  tohost_addr = symbols["tohost"];
  fromhost_addr = symbols["fromhost"];
} else {
  fprintf(stderr, "warning: tohost and fromhost symbols not in ELF; can't communicate with target\n");
}

Step 2: Polling

while (!signal_exit && exitcode == 0) {
  if (auto tohost = mem.read_uint64(tohost_addr)) {
    mem.write_uint64(tohost_addr, 0);
    command_t cmd(this, tohost, fromhost_callback);
    device_list.handle_command(cmd);
  } else {
    idle();
  }

  device_list.tick();

  if (!fromhost_queue.empty() && mem.read_uint64(fromhost_addr) == 0) {
    mem.write_uint64(fromhost_addr, fromhost_queue.front());
    fromhost_queue.pop();
  }
}

每一个cycle,模拟器都会检测tohost变量的值,若不为0,说明target向host发出了某种请求,需要进一步处理。也许Wikipedia上Polling (computer science)对此过程的描述有助于理解

  1. The host repeatedly reads the busy bit of the controller until it becomes clear.
  2. When clear, the host writes in the command register and writes a byte into the data-out register.
  3. The host sets the command-ready bit (set to 1).
  4. When the controller senses command-ready bit is set, it sets busy bit.
  5. The controller reads the command register and since write bit is set, it performs necessary I/O operations on the device. If the read bit is set to one instead of write bit, data from device is loaded into data-in register, which is further read by the host.
  6. The controller clears the command-ready bit once everything is over, it clears error bit to show successful operation and reset busy bit (0).

Step 3: Writing/Reading Magic Numbers

在bbl的machine/htif.h头文件中,定义了一些宏来方便对tohost的修改和对fromhost的读取

#if __riscv_xlen == 64
# define TOHOST_CMD(dev, cmd, payload) \
  (((uint64_t)(dev) << 56) | ((uint64_t)(cmd) << 48) | (uint64_t)(payload))
#else
# define TOHOST_CMD(dev, cmd, payload) ({ \
  if ((dev) || (cmd)) __builtin_trap(); \
  (payload); })
#endif
#define FROMHOST_DEV(fromhost_value) ((uint64_t)(fromhost_value) >> 56)
#define FROMHOST_CMD(fromhost_value) ((uint64_t)(fromhost_value) << 8 >> 56)
#define FROMHOST_DATA(fromhost_value) ((uint64_t)(fromhost_value) << 16 >> 16)

要注意的是,当使用32位交叉编译器时,__riscv_xlen的值为32,使用TOHOST_CMD会进入__builtin_trap(),根据编译器不同可能是死循环或者直接退出。devcmdpayload等参数的含义和取值,有兴趣的读者可自行研究。

Instruction Emulation

bbl还提供了指令模拟的功能,为上层的kernel提供模拟器中未实现的指令,这也是一个值得一提的feature。让我们来考虑在S-mode下尝试读取时间时会发生什么

asm volatile("rdtime a0");

由于rdtime指令未被实现,执行这一句时会引发Illegal instruction exception,被bbl的trap handler捕捉

trap_table:
  .word bad_trap
  .word bad_trap
  .word illegal_insn_trap
  .word bad_trap
  .word misaligned_load_trap
  .word bad_trap
  .word misaligned_store_trap
  .word bad_trap
  .word bad_trap
  .word mcall_trap
  .word bad_trap
  .word bad_trap
#define SOFTWARE_INTERRUPT_VECTOR 12
  .word software_interrupt
#define TIMER_INTERRUPT_VECTOR 13
  .word timer_interrupt
#define TRAP_FROM_MACHINE_MODE_VECTOR 14
  .word __trap_from_machine_mode

注意trap_table中的illegal_insn_trap就是illegal instruction的处理程序

void illegal_insn_trap(uintptr_t* regs, uintptr_t mcause, uintptr_t mepc)
{
  uintptr_t mstatus;
  insn_t insn = get_insn(mepc, &mstatus);

  if (unlikely((insn & 3) != 3))
    return truly_illegal_insn(regs, mcause, mepc, mstatus, insn);

  write_csr(mepc, mepc + 4);

  extern uint32_t illegal_insn_trap_table[];
  uint32_t* pf = (void*)illegal_insn_trap_table + (insn & 0x7c);
  emulation_func f = (emulation_func)(uintptr_t)*pf;
  f(regs, mcause, mepc, mstatus, insn);
}

在使用get_insn取出“非法”指令并适当判断后,bbl会将指令交给emulate_system_opcode函数处理。又经过各种判断和函数调用,程序流最终到达emulate_read_csr函数中

static inline int emulate_read_csr(int num, uintptr_t mstatus, uintptr_t* result)
{
  uintptr_t counteren =
    EXTRACT_FIELD(mstatus, MSTATUS_MPP) == PRV_U ? read_csr(mucounteren) :
                                                   read_csr(mscounteren);

  switch (num)
  {
    case CSR_TIME:
      if (!((counteren >> (CSR_TIME - CSR_CYCLE)) & 1))
        return -1;
      *result = *mtime;
      return 0;
#if __riscv_xlen == 32
    case CSR_TIMEH:
      if (!((counteren >> (CSR_TIME - CSR_CYCLE)) & 1))
        return -1;
      *result = *mtime >> 32;
      return 0;
#endif
  }
  return -1;
}

bbl从会从mtime中读取正确的时间然后返回,这里的mtime所指对象也是前面提到的HTIF的一部分。

从kernel层面看,执行指令后当前时间被正确放入了寄存器中,可见这类模拟对操作系统层面是完全透明的。bbl还可使用这一技巧在不支持浮点数扩展指令集的环境中模拟浮点运算。