Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index fix #13

Merged
merged 4 commits into from
Nov 26, 2022
Merged

Index fix #13

merged 4 commits into from
Nov 26, 2022

Conversation

TripleDogDare
Copy link
Contributor

@TripleDogDare TripleDogDare commented Nov 23, 2022

The way the indexing is built, siblings can end up missing from the ChildrenList. I'm not sure exactly how the incorrect children list is created, but switching to pointers makes the problem go away.
I think it's because the parent ChildrenList and the ChildrenList of a child in the parent's Children map are referencing different entities. However, looking at the code it appears that this isn't possible and it makes my head hurt a little.

Below is how the new test shows the failure.

$ go test ./...
--- FAIL: TestIndexingTree (0.00s)
    index_test.go:97: 
        error:
          values are not equal
        comment:
          one
        comment:
          list: [two three]
        comment:
          keys: [two three four]
        got:
          int(3)
        want:
          int(2)
        stack:
          /home/cjb/repos/warpforge/go-testmark/index_test.go:97
            qt.Check(t, len(dir.Children), qt.Equals, len(dir.ChildrenList),
                // If the lengths are equal then the map and list should contain entries with the same names.
                // We don't know if the dir entries are _actually_ equivalent but the test recurses above so it should be fine.
                qt.Commentf("%s", dir.Name),
                qt.Commentf("list: %v", names(dir.ChildrenList)),
                qt.Commentf("keys: %v", keys(dir.Children)),
            )
          /home/cjb/repos/warpforge/go-testmark/index_test.go:95
            assertChildren(t, child)
          /home/cjb/repos/warpforge/go-testmark/index_test.go:66
            assertChildren(t, *doc.DirEnt)
        
FAIL
FAIL	github.com/warpfork/go-testmark	0.003s
ok  	github.com/warpfork/go-testmark/testexec	(cached)
FAIL

Here's an absurdly large chunk of json that shows the structure of the failing bits.

$ jq '{Name: .Name, Children: .Children.one, ChildrenList: .ChildrenList[] | select(.Name == "one")}' log.json

{
  "Name": "",
  "Children": {
    "Name": "one",
    "Children": {
      "four": {
        "Name": "four",
        "Children": {
          "bang": {
            "Name": "bang",
            "Children": null,
            "ChildrenList": null
          }
        },
        "ChildrenList": [
          {
            "Name": "bang",
            "Children": null,
            "ChildrenList": null
          }
        ]
      },
      "three": {
        "Name": "three",
        "Children": null,
        "ChildrenList": null
      },
      "two": {
        "Name": "two",
        "Children": null,
        "ChildrenList": null
      }
    },
    "ChildrenList": [
      {
        "Name": "two",
        "Children": null,
        "ChildrenList": null
      },
      {
        "Name": "three",
        "Children": null,
        "ChildrenList": null
      },
      {
        "Name": "four",
        "Children": {
          "bang": {
            "Name": "bang",
            "Children": null,
            "ChildrenList": null
          }
        },
        "ChildrenList": [
          {
            "Name": "bang",
            "Children": null,
            "ChildrenList": null
          }
        ]
      }
    ]
  },
  "ChildrenList": {
    "Name": "one",
    "Children": {
      "four": {
        "Name": "four",
        "Children": {
          "bang": {
            "Name": "bang",
            "Children": null,
            "ChildrenList": null
          }
        },
        "ChildrenList": [
          {
            "Name": "bang",
            "Children": null,
            "ChildrenList": null
          }
        ]
      },
      "three": {
        "Name": "three",
        "Children": null,
        "ChildrenList": null
      },
      "two": {
        "Name": "two",
        "Children": null,
        "ChildrenList": null
      }
    },
    "ChildrenList": [
      {
        "Name": "two",
        "Children": null,
        "ChildrenList": null
      },
      {
        "Name": "three",
        "Children": null,
        "ChildrenList": null
      }
    ]
  }
}

@TripleDogDare
Copy link
Contributor Author

TripleDogDare commented Nov 23, 2022

So we've figured out that this is caused by implicit data copying when the ChildrenList exceeds capacity.
You can show this by adding or removing the number of unique children. Prepending the markdown file with a bunch of unique children will hide the behavior and make tests pass.
e.g.

[testmark]:# (split1)
\```
split
\```
[testmark]:# (split2)
\```
split
\```
[testmark]:# (split3)
\```
split
\```
[testmark]:# (split4)
\```
split
\```
[testmark]:# (split5)
\```
split
\```

... original file ...

@TripleDogDare
Copy link
Contributor Author

There's no good way to resolve this problem. Making the ChildrenList a slice of pointers instead of structs seems the least disruptive to existing users, but is likely to require some code changes downstream.

@warpfork
Copy link
Owner

warpfork commented Nov 23, 2022

I agree. There's no good way I can see to fix this without changing the signature of ChildenList to be a slice of pointers. That's unfortunate because it's an exported field, which means changing it is a breaking change. Nonetheless, it seems necessary.

We can hope that it's mostly a no-op change, at the syntactic level, to the downstream consumers? I think often it might be. A slight blessing if so.

(Considered and discarded: if only the map that's indexing children by name was constructed entirely after the children list -- then it wouldn't be grabbing pointers into a moving target, and there would be no trouble. Alas, the map is being referenced during that construction as well, so there's no easy disentangle.)

@warpfork
Copy link
Owner

warpfork commented Nov 24, 2022

I still have a slightly hard time estimating if the error this is fixing impacted many consumers. As an anecdata point, I just bumped a selected downstream project to use this, and it had 133 tests before, and it still has 133 tests after this bump. So, apparently, the way in which that project ranged over testmark data meant it wasn't really phased. (Or maybe it just hit a magic number that didn't hit the resizing path that encounters the bug. Seems unlikely though, considering it has a number of testmark files containing varying hunk counts and hunk name arrangements.)

Nonetheless, I still agree there's a bug here and it needs fixing, and so I consider this diff on-track to merge.

@TripleDogDare
Copy link
Contributor Author

@warpfork There are now links to known downstream changes.

@warpfork warpfork merged commit 99388b3 into warpfork:master Nov 26, 2022
@warpfork
Copy link
Owner

I'm going to tag a v0.11.0 to include this change immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants