Skip to content
Clifford Wolf edited this page Nov 17, 2019 · 10 revisions

TODOs for RISC-V BitManip v0.93:

Should we rename cmov/cmix to mux/mix?

-> Discuss on mailing list

Should there be a [un]shfliw instruction?

-> Discuss on mailing list

More pointer arithmetic instructions

In response to a request by a member of this task group: Add SH1ADD, SH2ADD, SH3ADD, SH4ADD, SH1ADDU.W, SH2ADDU.W, SH3ADDU.W, and SH4ADDU.W instructions with the following semantic.

SHnADD    RD, RS1, RS2     :=     RD = (RS1 << n) + RS2
SHnADDU.W RD, RS1, RS2     :=     RD = ((RS1 & 0xFFFFFFFF) << n) + RS2

These instructions only replace two other instructions each (SLLI + C.ADD or SLLIU.W + C.ADD), but these are extremely common operations for pointer arithmetic, so it might be worth having the extra instructions.

We might also want to create a new Zba (address) category for those 8 instructions and ADD[I]WU, SUBWU, ADDU.W, SUBU.W, and SLLIU.W.

-> Discuss on mailing list

Move shift-one instructions from Zbb to Zbp

People have raised concerns about shift-ones not being common enough to justify an inclusion in Zbb.

-> Discuss on mailing list

Add more example applications

(some of those are already in the doc)

clz, ctz, pcnt:

  • FP emulation
  • Hamming distance, parity

And-with-complement (andc):

  • MIX pattern
  • applying masks
  • And-inverter-graph evaluation
  • SHA-2 (1x in each round, ≈ 3% of operations)
  • SHA-3 (25x in each round, ≈ 15% of operations)


  • (dyn) mask generation

Generalized Reverse (grev, grevi):

  • bit permutation
  • endian-swapping (e.g. for big-endian)
  • bit reversal (e.g. for FFT)
  • bitboards (e.g. for chess engines)

Generalized Shuffle (shfl, unshfl, shfli, unshfli):

  • bit permutation
  • LUT input permutations
  • bitboards (e.g. chess engines)

Bit Extract/Deposit (bext, bdep):

  • maybe google more examples using x86 pext/pdep

Min/max instructions (min, max, minu, maxu):

  • branchless code
  • saturated arithmetic
  • absolute value

Carry-less multiply (clmul, clmulh):

  • CRC and CRC-like ("industry") algorithms
  • Hashing, PRNG
  • Gray decode

Bit-matrix operations (bmatxor, bmator, bitmatflip):

  • bit permutation (within bytes)
  • byte permutation
  • bit duplication (within bytes)
  • byte duplication
  • many xor / many or (think "vector lite")
  • full NxM bit matrix multiply (using many 8x8 ops)
  • searching (finding zero bytes in 8-byte chunk)
  • linear algebra in GF(2)
  • arithmetic in GF(2k) with k ≤ 8

Funnel shift (fsl, fsr, fsri):

  • mask generation
  • bit/byte permutations on >XLEN blocks
  • consuming a non-byte-aligned bit-stream of variable-length words
Clone this wiki locally