[AdditionalVectorCrypto] adding section on addition vector crypto extensions #1306

nibrunieAtSi5 · 2024-03-28T14:37:50Z

This pull request is the riscv-isa-manual version of a pull request started on riscv-crypto: riscv/riscv-crypto#362.

/!\ This pull request is a draft for the future fast track extensions.

This pull requests draft the changes associated with two fast track extensions for vector crypto.

During the specification process for vector crypto 1.0.0 a few items had to be discarded because they appeared too late in the process. This fast track extension tries to address some of them.

The official demand that will be discussed in the Task Group and submitted to the Unpriv Committee is being drafter here: https://docs.google.com/document/d/1zpYhnZi2NxhjfcBGvPOy0oDhx6lTXchscG17Qcl6wv8/edit?usp=sharing

New features:

Zvbc32e: Extending vclmul[h].v[vh] instruction to support SEW=32-bit value
- should be available standalone (ELEN >= 32) or in addition to Zvbc (ELEN >= 64)
- no new encoding
Zvkgs: Adding .vs variants to vghsh and vghmul
- should depend on Zvkg
- new encodings

Open questions:

Should Zvbc32e be allowed when ELEN >= 32 without depending on Zvbc ? (Answer: YES)
Should Zvbc32e support SEW=16 ? (SEW=8 ?)
Find encodings
~~How to name the two new extensions~~
Do we need to define a Zvkt(bc/bc32e) to extend Zvkt to the extension of vclmul[h/] defined in Zvbc32e ?

Related changes:

spike-isa-sim modifications
- add support for Zvbc32e and Zvkgs: Vector crypto additional riscv-software-src/riscv-isa-sim#1748
encoding proposal in riscv-opcodes repo: Vector crypto additional extensions: Zvbc32e and Zvkgs riscv-opcodes#270
adapt vector crypto code samples for Zvbce32 https://github.com/riscv/riscv-crypto/blob/main/doc/vector/code-samples/zvbc-test.c
adapt vector crypto code samples for Zvkgs https://github.com/riscv/riscv-crypto/blob/main/doc/vector/code-samples/zvkg-test.c
Sail models
- Zvkgs
- Zvbc32e

Draft versions:

Version	pdf
v0.0.1 (August 31st 2023)	https://github.com/riscv/riscv-crypto/files/12487628/riscv-crypto-spec-vector-extra.pdf
v0.0.2 (January 17th 2024)	https://github.com/riscv/riscv-crypto/files/13970691/riscv-crypto-spec-vector-extra.pdf
v0.0.3 (February 1st 2024)	https://github.com/riscv/riscv-crypto/files/14146438/riscv-crypto-spec-vector-extra_v0.0.3.pdf
v0.0.4 (February 6th 2024)	riscv-crypto-spec-vector-extra_v0.0.4.pdf
v0.0.5 (March 7th 2024)	riscv-crypto-spec-vector-extra_v0.0.5.pdf
v0.0.6 (June 19th 2024)	riscv-crypto-spec-vector-extra_v0.0.6.pdf
v0.0.7 (July 31st 2024)	riscv-crypto-spec-vector-extra_v0.0.7.pdf

Changelogs

from v0.0.5 to v0.0.6:
- adding vs2 / vd overlap as reserved encoding for new vghsh.vs / vgmul.vs instructions (review feedback from @QJtaibai )

Original Plan for the fast track schedule

References

project announcement on the unpriv mailing list (https://lists.riscv.org/g/tech-unprivileged/message/568) and the crypto TG mailing list (https://lists.riscv.org/g/tech-crypto-ext/message/944)

QJtaibai · 2024-06-05T12:13:28Z

Will it be reserved or not for instructions (vghsh and vgmul) , if the vd register group overlaps with the vs2 register group.

nibrunieAtSi5 · 2024-06-20T04:23:42Z

Will it be reserved or not for instructions (vghsh and vgmul) , if the vd register group overlaps with the vs2 register group.

Good question @QJtaibai , this is not listed but should be to aligned with the other .vs instructions (and since overlapping those does not make sense functionally and can make implementation harder).

Yunzezhu94 · 2024-06-25T03:09:23Z

Hello! I have a question that in the instruction part of document vghsh.vs shares same encoding with vghsh.vv, and has func6:101100, while in Appendix A it looks vghsh.vs have func6:100011. I wonder which one is correct?

src/vector-crypto-additional.adoc

nibrunie · 2024-07-05T17:37:36Z

Hello! I have a question that in the instruction part of document vghsh.vs shares same encoding with vghsh.vv, and has func6:101100, while in Appendix A it looks vghsh.vs have func6:100011. I wonder which one is correct?

Appendix A and the riscv-v opcode Pull request (https://github.com/nibrunieAtSi5/riscv-opcodes/pull/1/files) are right, the instruciton part of the document was incorrect. I have fixed it.

Note that those opcodes are simply suggestion at this point (but that does not mean the suggestions should not be consistent).

Thank you for pointing that out.

nibrunieAtSi5 · 2024-08-01T04:45:16Z

I have developed a small example to demonstrate how Zvbc32e can be leveraged to implement CRC (with the folding method): https://github.com/nibrunie/rvv-examples/blob/e55d529fe54316a95b733aea82cadbfbfbf08e67/src/crc/vector_crc_be_zvbc32e.c

The current implementation provides "performance" of about 0.65 instruction per bytes (on 1MB input buffer).

nibrunie · 2024-08-01T04:53:47Z

riscv-crypto-spec-vector-extra_v0.0.7.pdf

These two extensions add addtional instructions for carryless multiplication with 32-bits elements and Vector-Scalar GCM instructions. Please see riscv/riscv-isa-manual#1306.

wangpc-pp · 2024-08-14T07:22:20Z

src/vector-crypto-additional.adoc

+[wavedrom, , svg]
+....
+{reg:[
+{bits: 7, name: 'OP-P'},


I think we have renamed it to OP_VE in #1311.

wangpc-pp · 2024-08-15T06:20:09Z

Will it be reserved or not for instructions (vghsh and vgmul) , if the vd register group overlaps with the vs2 register group.

Good question @QJtaibai , this is not listed but should be to aligned with the other .vs instructions (and since overlapping those does not make sense functionally and can make implementation harder).

We don't have this constraint on vghsh.vv and vgmul.vv? Is it OK to make it stricter?

topperc · 2024-08-15T06:46:34Z

Will it be reserved or not for instructions (vghsh and vgmul) , if the vd register group overlaps with the vs2 register group.

Good question @QJtaibai , this is not listed but should be to aligned with the other .vs instructions (and since overlapping those does not make sense functionally and can make implementation harder).

We don't have this constraint on vghsh.vv and vgmul.vv? Is it OK to make it stricter?

For LMUL>1, the operations likely take place over multiple cycles or uops. The restriction on .vs ensure that the scalar element in the first register is not overwritten before all LMUL parts have been processed. This is the same reason we have rules on widening and narrowing instructions. This restriction is not needed for .vv since the input bits and output bits are from the same offset in the register group. Overwriting an earlier part of the register doesn't affect operations on later parts of the register.

wangpc-pp · 2024-08-15T06:57:54Z

Will it be reserved or not for instructions (vghsh and vgmul) , if the vd register group overlaps with the vs2 register group.

Good question @QJtaibai , this is not listed but should be to aligned with the other .vs instructions (and since overlapping those does not make sense functionally and can make implementation harder).

We don't have this constraint on vghsh.vv and vgmul.vv? Is it OK to make it stricter?

For LMUL>1, the operations likely take place over multiple cycles or uops. The restriction on .vs ensure that the scalar element in the first register is not overwritten before all LMUL parts have been processed. This is the same reason we have rules on widening and narrowing instructions. This restriction is not needed for .vv since the input bits and output bits are from the same offset in the register group. Overwriting an earlier part of the register doesn't affect operations on later parts of the register.

Yeah, thanks! I get it! But it seems that vd can overlap vs1 for reductions, aren't they the same?

topperc · 2024-08-15T07:02:13Z

Will it be reserved or not for instructions (vghsh and vgmul) , if the vd register group overlaps with the vs2 register group.

Good question @QJtaibai , this is not listed but should be to aligned with the other .vs instructions (and since overlapping those does not make sense functionally and can make implementation harder).

We don't have this constraint on vghsh.vv and vgmul.vv? Is it OK to make it stricter?

For LMUL>1, the operations likely take place over multiple cycles or uops. The restriction on .vs ensure that the scalar element in the first register is not overwritten before all LMUL parts have been processed. This is the same reason we have rules on widening and narrowing instructions. This restriction is not needed for .vv since the input bits and output bits are from the same offset in the register group. Overwriting an earlier part of the register doesn't affect operations on later parts of the register.

Yeah, thanks! I get it! But it seems that vd can overlap vs1 for reductions, aren't they the same?

Maybe the assumption for reductions is that vd would only be updated after all of vs1 is read since you need to see all elements before you know the result.

aswaterman · 2024-08-15T07:18:14Z

@topperc Right, for reductions, the amount of additional state that’s needed is nil for many implementations, but in the worst case, it’s bounded by ELEN. Here, I guess it’s bounded by EGW. That difference could be material for narrow vector machines. @nibrunieAtSi5 can you illuminate the situation?

DavidYu360 · 2024-08-15T14:16:34Z

What about the VPR alignment rules for vghsh.vs / vgmul.vs?
As I mentioned in riscv-software-src/riscv-isa-sim#1548

Possible alignment value: 1, 2, 4, 8 (1 for any VPR, 2 for V0, V2, ...)
Prerequisite: VLEN * LMUL >= EGW

Single key in .vs inst: max(EGW/VLEN, 1)

Others: max(LMUL, 1)

These two extensions add addtional instructions for carryless multiplication with 32-bits elements and Vector-Scalar GCM instructions. Please see riscv/riscv-isa-manual#1306.

nibrunieAtSi5 · 2024-08-19T10:18:02Z

The non-overlap constraint for vector element group is part of the vector crypto specification

riscv-isa-manual/src/vector-crypto.adoc

Line 311 in 04179ad

Source/Destination overlap constraints::

In the case of the .vs instructions defined in this specification, vs2 holds a 128-bit scalar element group.
For implementations with VLEN ≥ 128, vs2 refers to a single register. Thus, the vd register group must not
overlap the vs2 register.
However, in implementations where VLEN < 128, vs2 refers to a register group comprised of the number
of registers needed to hold the 128-bit scalar element group. In this case, the vd register group must not
overlap this vs2 register group.

[%autowidth]
[%header,cols="4,4,4"]
|===
| Instruction
| Register
| Cannot Overlap

| vaes*.vs | vs2 | vd
| vsm4r.vs | vs2 | vd
| vsha2c[hl] | vs1, vs2 | vd
| vsha2ms | vs1, vs2 | vd
| vsm3me | vs2 | vd
| vsm3c | vs2 | vd

It was defined as no overlapping between vector register groups.

This was not necessarily repeated for each operation

riscv-isa-manual/src/vector-crypto.adoc

Line 1530 in 04179ad

    
           * Only for the `.vs` form: the `vd` register group overlaps the `vs2` scalar element group

Only for the .vs form: the vd register group overlaps the vs2 scalar element group

I think this constraint makes sense here as well since allowing overlapping would constraint uarch which cannot save/stage the scalar element group before finishing the operation (as stated by @topperc it is highly likely that for LMUL > EGW / VLEN implementations will iterate to compute the operation over the full vector register groups in a few cycles). Note that .vs makes sense for vl / EGS > 1 since this is where we can actually exploit the fact of having a scalar element group.
The size of vs2 vector register group should be understood as with EMUL = min(EGW / VLEN, 1).
The example in the base vector crypto spec seems to hint this way:

However, in implementations where VLEN < 128, vs2 refers to a register group comprised of the number
of registers needed to hold the 128-bit scalar element group.

but should maybe be generalized to EGW.

topperc · 2024-08-29T05:26:58Z

Zvkgs: Adding .vs variants to vghsh and vghmul
should depend on Zvkgs

Is the second line supposed to say Zvkg rather than Zvkgs?

nibrunie · 2024-09-02T16:24:13Z

Zvkgs: Adding .vs variants to vghsh and vghmul
should depend on Zvkgs

Is the second line supposed to say Zvkg rather than Zvkgs?

Yes, you are right @topperc , this is a typo dependency to Zvkg was intended (the assumption is that the vector-scalar(element group) variant do not make sense without the vector-vector variant.

httoracle · 2024-09-09T09:08:10Z

In the v0.0.7 spec, I have a question of the operation code of the vghsh.vs and vgmul.vs. Take vghsh.vs for example, It says H is common to all element groups, and you put the "let H = brev8(...)" code outside the loop. I suspect the H of all the group should be the same, and is from the first 128bit of vs2. But inside the each loop, H will be changed.
So if you write just like the operation code in the spec, the H of the next loop will not be the first 128bit of the vs2. I want to ask it is just the purpose of the instruction or is just a miss?
If you want the H to be the same value in the beginning of the each loop, you should put "let H = brev8(...)" inside the loop, so in the beginning of the loop, we can get the H value from vs2 again.

nibrunie · 2024-09-10T03:47:28Z

In the v0.0.7 spec, I have a question of the operation code of the vghsh.vs and vgmul.vs. Take vghsh.vs for example, It says H is common to all element groups, and you put the "let H = brev8(...)" code outside the loop. I suspect the H of all the group should be the same, and is from the first 128bit of vs2. But inside the each loop, H will be changed. So if you write just like the operation code in the spec, the H of the next loop will not be the first 128bit of the vs2. I want to ask it is just the purpose of the instruction or is just a miss? If you want the H to be the same value in the beginning of the each loop, you should put "let H = brev8(...)" inside the loop, so in the beginning of the loop, we can get the H value from vs2 again.

@httoracle You are right.
H needs to be reset inside the outer loop. this is a mistake in the current behavior code.
It should be fixed by 9982d13.

nibrunie · 2024-09-10T04:28:13Z

Additional Vector Crypto extension v0.0.8 (September 9th 2024):
riscv-crypto-spec-vector-extra_v0.0.8.pdf

Changelog from v0.0.7:

Fixing behavior of vghsh.vs and vgmul.vs (thanks to @httoracle )
Upgrading alias for 0x77 opcode from OP-P to OP-VE (thanks to @wangpc-pp )

…ensions

Signed-off-by: Nicolas Brunie <nibrunie@gmail.com>

nibrunieAtSi5 mentioned this pull request Mar 28, 2024

[extension fast track] extra vector crypto instructions, Zvbc32e/Zvkgs riscv/riscv-crypto#362

Closed

13 tasks

nibrunie reviewed Jul 5, 2024

View reviewed changes

src/vector-crypto-additional.adoc Outdated Show resolved Hide resolved

henry-hsieh mentioned this pull request Jul 23, 2024

Add ratified or drafted extensions henry-hsieh/riscv-asm-vim#1

Open

72 tasks

This was referenced Jul 25, 2024

Vector crypto additional riscv-software-src/riscv-isa-sim#1748

Draft

Vector crypto additional extensions: Zvbc32e and Zvkgs riscv/riscv-opcodes#270

Draft

wangpc-pp mentioned this pull request Aug 14, 2024

[RISCV][MC] Support experimental extensions Zvbc32e and Zvkgs llvm/llvm-project#103709

Merged

wangpc-pp reviewed Aug 14, 2024

View reviewed changes

nibrunieAtSi5 added 2 commits September 20, 2024 14:04

[AdditionalVectorCrypto] adding section on addition vector crypto ext…

ee09c33

…ensions

fixing some inter/intra document links + typos

b9dda07

nibrunie added 3 commits September 20, 2024 14:04

[extra-vector-crypto] review feedback: vd/vs2 is reserved

a6efe58

Fixing vghsh.vs opcode

320d64f

Signed-off-by: Nicolas Brunie <nibrunie@gmail.com>

Fixing H value reset for vghsh/vgmul

c4f3549

Signed-off-by: Nicolas Brunie <nibrunie@gmail.com>

nibrunie force-pushed the additional-vector-crypto branch from 9982d13 to 9b12009 Compare September 20, 2024 21:11

Integrating review feedback from the ARC

a12c20d

nibrunie force-pushed the additional-vector-crypto branch from 9b12009 to a12c20d Compare September 20, 2024 21:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AdditionalVectorCrypto] adding section on addition vector crypto extensions #1306

[AdditionalVectorCrypto] adding section on addition vector crypto extensions #1306

nibrunieAtSi5 commented Mar 28, 2024 •

edited

Loading

QJtaibai commented Jun 5, 2024

nibrunieAtSi5 commented Jun 20, 2024

Yunzezhu94 commented Jun 25, 2024

nibrunie commented Jul 5, 2024

nibrunieAtSi5 commented Aug 1, 2024

nibrunie commented Aug 1, 2024

wangpc-pp Aug 14, 2024

wangpc-pp commented Aug 15, 2024

topperc commented Aug 15, 2024 •

edited

Loading

wangpc-pp commented Aug 15, 2024

topperc commented Aug 15, 2024

aswaterman commented Aug 15, 2024

DavidYu360 commented Aug 15, 2024

nibrunieAtSi5 commented Aug 19, 2024

topperc commented Aug 29, 2024

nibrunie commented Sep 2, 2024

httoracle commented Sep 9, 2024

nibrunie commented Sep 10, 2024 •

edited

Loading

nibrunie commented Sep 10, 2024

[AdditionalVectorCrypto] adding section on addition vector crypto extensions #1306

Are you sure you want to change the base?

[AdditionalVectorCrypto] adding section on addition vector crypto extensions #1306

Conversation

nibrunieAtSi5 commented Mar 28, 2024 • edited Loading

New features:

Open questions:

Related changes:

Draft versions:

Changelogs

Original Plan for the fast track schedule

References

QJtaibai commented Jun 5, 2024

nibrunieAtSi5 commented Jun 20, 2024

Yunzezhu94 commented Jun 25, 2024

nibrunie commented Jul 5, 2024

nibrunieAtSi5 commented Aug 1, 2024

nibrunie commented Aug 1, 2024

wangpc-pp Aug 14, 2024

Choose a reason for hiding this comment

wangpc-pp commented Aug 15, 2024

topperc commented Aug 15, 2024 • edited Loading

wangpc-pp commented Aug 15, 2024

topperc commented Aug 15, 2024

aswaterman commented Aug 15, 2024

DavidYu360 commented Aug 15, 2024

nibrunieAtSi5 commented Aug 19, 2024

topperc commented Aug 29, 2024

nibrunie commented Sep 2, 2024

httoracle commented Sep 9, 2024

nibrunie commented Sep 10, 2024 • edited Loading

nibrunie commented Sep 10, 2024

nibrunieAtSi5 commented Mar 28, 2024 •

edited

Loading

topperc commented Aug 15, 2024 •

edited

Loading

nibrunie commented Sep 10, 2024 •

edited

Loading