-
Notifications
You must be signed in to change notification settings - Fork 987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec checker tool #8722
Merged
Merged
Spec checker tool #8722
Changes from all commits
Commits
Show all changes
42 commits
Select commit
Hold shift + click to select a range
2e7a187
Add specdocs static code analyzer
farazdagi 6f70179
docs pulling script
farazdagi 5c6ce1f
update content pulling script
farazdagi 8d58f93
add test
farazdagi 49cb9e2
better parsing of incoming docs
farazdagi 4195460
update test
farazdagi 50d6eeb
implements analyzer
farazdagi b58d04b
separate tool
farazdagi 49159c7
remove analyzer code
farazdagi 4691d96
cleanup
farazdagi cf14799
Merge branch 'develop' into spec-checker-tool
farazdagi 0d50453
deep source fixes
farazdagi a97896e
untrack raw specs files
farazdagi a404228
add back phase0 defs
farazdagi 3c892fa
Merge branch 'develop' into spec-checker-tool
farazdagi d6f013d
update spec texts
farazdagi fcc732d
re-arrange code
farazdagi 0754340
updated spec list
farazdagi 05c4f4b
Merge branch 'develop' into spec-checker-tool
farazdagi d30eb53
cleanup
farazdagi fd738f8
more comments and readme
farazdagi 5bb06b5
add merkle proofs specs
farazdagi ef0b3cc
add extra.md
farazdagi f0b083a
mark wrong length issue
farazdagi 86cf6f7
Merge branch 'develop' into spec-checker-tool
farazdagi 32433dd
update readme
farazdagi 8dc9da5
update readme
farazdagi 14a112c
remove non-def snippets
farazdagi e048cad
update comment
farazdagi a376e01
check numrows
farazdagi d95faca
Merge branch 'develop' into spec-checker-tool
farazdagi 3bcc98c
Merge branch 'develop' into spec-checker-tool
farazdagi e4c791c
ignore last empty line
farazdagi 37e2add
Merge branch 'spec-checker-tool' of github.com:prysmaticlabs/prysm in…
farazdagi 2c56ab7
Merge branch 'develop' into spec-checker-tool
farazdagi 27b70ca
Merge branch 'develop' into spec-checker-tool
farazdagi f8f7ee8
Merge branch 'develop' into spec-checker-tool
farazdagi 0cc6fcc
Merge branch 'develop' into spec-checker-tool
rauljordan d8bc1d1
Merge branch 'develop' into spec-checker-tool
farazdagi 85dde5a
Merge branch 'develop' into spec-checker-tool
farazdagi fe41f5a
Merge branch 'develop' into spec-checker-tool
farazdagi 8cfa92a
Merge branch 'develop' into spec-checker-tool
farazdagi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
load("@io_bazel_rules_go//go:def.bzl", "go_binary") | ||
load("@prysm//tools/go:def.bzl", "go_library") | ||
|
||
go_library( | ||
name = "go_default_library", | ||
srcs = [ | ||
"check.go", | ||
"download.go", | ||
"main.go", | ||
], | ||
embedsrcs = [ | ||
"data/specs/phase0/beacon-chain.md", | ||
"data/specs/phase0/fork-choice.md", | ||
"data/specs/phase0/validator.md", | ||
"data/specs/phase0/weak-subjectivity.md", | ||
"data/ssz/merkle-proofs.md", | ||
"data/extra.md", | ||
], | ||
importpath = "github.com/prysmaticlabs/prysm/tools/specs-checker", | ||
visibility = ["//visibility:public"], | ||
deps = ["@com_github_urfave_cli_v2//:go_default_library"], | ||
) | ||
|
||
go_binary( | ||
name = "specs-checker", | ||
embed = [":go_default_library"], | ||
visibility = ["//visibility:public"], | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# Specs checker tool | ||
|
||
This simple tool helps downloading and parsing [ETH2 specs](https://github.com/ethereum/eth2.0-specs/tree/dev/specs), | ||
to be later used for making sure that our reference comments match specs definitions precisely. | ||
|
||
### Updating the reference specs | ||
See `main.go` for a list of files to be downloaded, currently: | ||
```golang | ||
var specDirs = map[string][]string{ | ||
"specs/phase0": { | ||
"beacon-chain.md", | ||
"fork-choice.md", | ||
"validator.md", | ||
"weak-subjectivity.md", | ||
}, | ||
"ssz": { | ||
"merkle-proofs.md", | ||
}, | ||
} | ||
``` | ||
|
||
To download/update specs: | ||
```bash | ||
bazel run //tools/specs-checker download -- --dir=$PWD/tools/specs-checker/data | ||
``` | ||
|
||
This will pull the files defined in `specDirs`, parse them (extract Python code snippets, discarding any other text), | ||
and save them to the folder from which `bazel run //tools/specs-checker check` will be able to embed. | ||
|
||
### Checking against the reference specs | ||
|
||
To check whether reference comments have the matching version of Python specs: | ||
```bash | ||
bazel run //tools/specs-checker check -- --dir $PWD/beacon-chain | ||
bazel run //tools/specs-checker check -- --dir $PWD/validator | ||
bazel run //tools/specs-checker check -- --dir $PWD/shared | ||
``` | ||
Or, to check the whole project: | ||
```bash | ||
bazel run //tools/specs-checker check -- --dir $PWD | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,176 @@ | ||
package main | ||
|
||
import ( | ||
"fmt" | ||
"go/ast" | ||
"go/parser" | ||
"go/token" | ||
"os" | ||
"path" | ||
"path/filepath" | ||
"regexp" | ||
"strings" | ||
|
||
"github.com/urfave/cli/v2" | ||
) | ||
|
||
// Regex to find Python's "def". | ||
var reg1 = regexp.MustCompile(`def\s(.*)\(.*`) | ||
|
||
// checkNumRows defines whether tool should check that the spec comment is the last comment of the block, so not only | ||
// it matches the reference snippet, but it also has the same number of rows. | ||
const checkNumRows = false | ||
|
||
func check(cliCtx *cli.Context) error { | ||
// Obtain reference snippets. | ||
defs, err := parseSpecs() | ||
if err != nil { | ||
return err | ||
} | ||
|
||
// Walk the path, and process all contained Golang files. | ||
fileWalker := func(path string, info os.FileInfo, err error) error { | ||
if info == nil { | ||
return fmt.Errorf("invalid input dir %q", path) | ||
} | ||
if !strings.HasSuffix(info.Name(), ".go") { | ||
return nil | ||
} | ||
return inspectFile(path, defs) | ||
} | ||
return filepath.Walk(cliCtx.String(dirFlag.Name), fileWalker) | ||
} | ||
|
||
func inspectFile(path string, defs map[string][]string) error { | ||
// Parse source files, and check the pseudo code. | ||
fset := token.NewFileSet() | ||
file, err := parser.ParseFile(fset, path, nil, parser.ParseComments) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
ast.Inspect(file, func(node ast.Node) bool { | ||
stmt, ok := node.(*ast.CommentGroup) | ||
if !ok { | ||
return true | ||
} | ||
// Ignore comment groups that do not have python pseudo-code. | ||
chunk := stmt.Text() | ||
if !reg1.MatchString(chunk) { | ||
return true | ||
} | ||
|
||
// Trim the chunk, so that it starts from Python's "def". | ||
loc := reg1.FindStringIndex(chunk) | ||
chunk = chunk[loc[0]:] | ||
|
||
// Find out Python function name. | ||
defName, defBody := parseDefChunk(chunk) | ||
if defName == "" { | ||
fmt.Printf("%s: cannot parse comment pseudo code\n", fset.Position(node.Pos())) | ||
return false | ||
} | ||
|
||
// Calculate differences with reference implementation. | ||
refDefs, ok := defs[defName] | ||
if !ok { | ||
fmt.Printf("%s: %q is not found in spec docs\n", fset.Position(node.Pos()), defName) | ||
return false | ||
} | ||
if !matchesRefImplementation(defName, refDefs, defBody, fset.Position(node.Pos())) { | ||
fmt.Printf("%s: %q code does not match reference implementation in specs\n", fset.Position(node.Pos()), defName) | ||
return false | ||
} | ||
|
||
return true | ||
}) | ||
|
||
return nil | ||
} | ||
|
||
// parseSpecs parses input spec docs into map of function name -> array of function bodies | ||
// (single entity may have several definitions). | ||
func parseSpecs() (map[string][]string, error) { | ||
loadSpecsFile := func(sb *strings.Builder, specFilePath string) error { | ||
chunk, err := specFS.ReadFile(specFilePath) | ||
if err != nil { | ||
return fmt.Errorf("cannot read specs file: %w", err) | ||
} | ||
_, err = sb.Write(chunk) | ||
if err != nil { | ||
return fmt.Errorf("cannot copy specs file: %w", err) | ||
} | ||
return nil | ||
} | ||
|
||
// Traverse all spec files, and aggregate them within as single string. | ||
var sb strings.Builder | ||
for dirName, fileNames := range specDirs { | ||
for _, fileName := range fileNames { | ||
if err := loadSpecsFile(&sb, path.Join("data", dirName, fileName)); err != nil { | ||
return nil, err | ||
} | ||
} | ||
} | ||
|
||
// Load file with extra definitions (this allows us to use pseudo-code that is not from specs). | ||
if err := loadSpecsFile(&sb, path.Join("data", "extra.md")); err != nil { | ||
return nil, err | ||
} | ||
|
||
// Parse docs into function name -> array of function bodies map. | ||
chunks := strings.Split(strings.ReplaceAll(sb.String(), "```python", ""), "```") | ||
defs := make(map[string][]string, len(chunks)) | ||
for _, chunk := range chunks { | ||
defName, defBody := parseDefChunk(chunk) | ||
if defName == "" { | ||
continue | ||
} | ||
defs[defName] = append(defs[defName], defBody) | ||
} | ||
return defs, nil | ||
} | ||
|
||
// parseDefChunk extract function name and function body from a Python's "def" chunk. | ||
func parseDefChunk(chunk string) (string, string) { | ||
chunk = strings.TrimLeft(chunk, "\n") | ||
if chunk == "" { | ||
return "", "" | ||
} | ||
chunkLines := strings.Split(chunk, "\n") | ||
// Ignore all snippets, that do not define functions. | ||
if chunkLines[0][:4] != "def " { | ||
return "", "" | ||
} | ||
defMatches := reg1.FindStringSubmatch(chunkLines[0]) | ||
if len(defMatches) < 2 { | ||
return "", "" | ||
} | ||
return strings.Trim(defMatches[1], " "), chunk | ||
} | ||
|
||
// matchesRefImplementation compares input string to reference code snippets (there might be multiple implementations). | ||
func matchesRefImplementation(defName string, refDefs []string, input string, pos token.Position) bool { | ||
for _, refDef := range refDefs { | ||
refDefLines := strings.Split(strings.TrimRight(refDef, "\n"), "\n") | ||
inputLines := strings.Split(strings.TrimRight(input, "\n"), "\n") | ||
|
||
matchesPerfectly := true | ||
for i := 0; i < len(refDefLines); i++ { | ||
a, b := strings.Trim(refDefLines[i], " "), strings.Trim(inputLines[i], " ") | ||
if a != b { | ||
matchesPerfectly = false | ||
break | ||
} | ||
} | ||
// Mark potential issues, when there's some more comments in our code (which might be ok, as we are not required | ||
// to put specs comments as the last one in the doc block). | ||
if checkNumRows && len(refDefLines) != len(inputLines) { | ||
fmt.Printf("%s: %q potentially has issues (comment is longer than reference implementation)\n", pos, defName) | ||
} | ||
if matchesPerfectly { | ||
return true | ||
} | ||
} | ||
return false | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
```python | ||
def Sign(SK: int, message: Bytes) -> BLSSignature | ||
``` | ||
```python | ||
def Verify(PK: BLSPubkey, message: Bytes, signature: BLSSignature) -> bool | ||
``` | ||
```python | ||
def AggregateVerify(pairs: Sequence[PK: BLSPubkey, message: Bytes], signature: BLSSignature) -> bool | ||
``` | ||
```python | ||
def FastAggregateVerify(PKs: Sequence[BLSPubkey], message: Bytes, signature: BLSSignature) -> bool | ||
``` | ||
```python | ||
def Aggregate(signatures: Sequence[BLSSignature]) -> BLSSignature | ||
``` |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to check that all refDefs match perfectly or just any of them match perfectly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I loop through all versions of a given definition (say, in
altair
we have the function with the same name but different implementation, I want to make sure that both comments in altair-related code and phase0 related code are checked).So, at the beginning of loop that checks that all definition's lines match our comment, the
matchesPerfectly
flag is set totrue
. Line checking loop breaks when non-matching line is found, thematchesPerfectly
flag is set tofalse
. So, a single matching definition is enough for flag to survive, and us be sure that our comment matches at least one known version of spec's function.NB: Will search rules go for an issue, and if none exists will create my own -- somehow missed that part, thanks for pointing that out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go_embed_data
is mentioned in go: support go:embed directives in 1.16 bazelbuild/rules_go#2775). So, it seems we can implement the checker as CI static code analyzer with go_embed_data, do you think it is worth implementing it right now, or should we wait for update onrules_go
side?