Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve JsonPatch diff performances #331

Merged
merged 9 commits into from
Mar 2, 2023
Merged

Conversation

satabin
Copy link
Member

@satabin satabin commented Mar 2, 2023

The diffing algorithm was really slow in case of big collections and deeply nested structures.
The main reasons were:

  • the JSON equality test done at the beginning wen through the entire JSON structure figuring out the deeply nested values were not equal
    • this equality is now performed only if we know both JSONs are not arrays or objects
  • hashes were recomputed over and over again in the Patience LCS computation
    • this was due to a refactoring losing the fact that we use cached hashes, so this switched back to using the hash caching LCS.
  • a lot of list concatenation append, each being O(n) in best case (reverse_:::)
    • switching to Chain greatly improves collection appends
  • when diffing objects, fields were tested for equality at each level, resulting in doing the recursive equality check again and again
    • the object diffing method now traverses the objects only once, computing equalities on the go

Moreover on deeply nested structures, the call stack was exploding. Since the diff methods are mutually recursive, tail recursion elimination was not easy to achieve. The new implementation switches to Eval internally to make the recursive calls stack safe.

Below the results of the benchmarks for big arrays and deeply nested objects.

Before:

Benchmark                             Mode  Cnt     Score    Error  Units
PatienceBenchmarks.diffArray         thrpt    5  1469.334 ± 20.458  ops/s
PatienceBenchmarks.diffDeep          thrpt    5   493.050 ± 14.758  ops/s

After:

Benchmark                             Mode  Cnt     Score    Error  Units
PatienceBenchmarks.diffArray         thrpt    5  1857.889 ± 40.768  ops/s
PatienceBenchmarks.diffDeep          thrpt    5  1453.636 ± 33.049  ops/s

@satabin satabin requested a review from ybasket March 2, 2023 17:53
@satabin satabin added this to the 4.4.0 milestone Mar 2, 2023
@satabin satabin enabled auto-merge March 2, 2023 18:47
Copy link
Collaborator

@ybasket ybasket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some mostly syntactic suggestions and one remark about dependencies, but looks good overall 👍

.settings(
name := "diffson-benchmarks",
libraryDependencies ++= Seq(
"io.circe" %% "circe-literal" % circeVersion
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"io.circe" %% "circe-literal" % circeVersion
"io.circe" %% "circe-core" % circeVersion

You don't seem to use literal, just syntax from core.

_.prepend(Remove(path / fld, if (rememberOld) Some(value1) else None)))
}
case Nil =>
Eval.now(Chain.fromSeq(fields2.toList).map { case (fld, value) => Add(path / fld, value) })
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Eval.now(Chain.fromSeq(fields2.toList).map { case (fld, value) => Add(path / fld, value) })
Eval.now(Chain.fromSeq(fields2.toList.map { case (fld, value) => Add(path / fld, value) }))

I'm pretty sure map is slightly faster on List. I also wonder whether one can go more directly from Map to Chain, but you probably tried already.

def remove(from: Int, until: Int, shift: Int, arr: List[Json]): Chain[Operation[Json]] =
Chain.fromSeq(
for (idx <- until to from by -1)
yield Remove[Json](path / idx, if (rememberOld) Some(arr(idx - shift)) else None))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
yield Remove[Json](path / idx, if (rememberOld) Some(arr(idx - shift)) else None))
yield Remove[Json](path / idx, Option.when(rememberOld)(arr(idx - shift))))

case None =>
// field is not in the second object, delete it
fieldsDiff(fields1, fields2, path).map(
_.prepend(Remove(path / fld, if (rememberOld) Some(value1) else None)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_.prepend(Remove(path / fld, if (rememberOld) Some(value1) else None)))
_.prepend(Remove(path / fld, Option.when(rememberOld)(value1))))

@satabin satabin added this pull request to the merge queue Mar 2, 2023
Merged via the queue into main with commit 6bdff5b Mar 2, 2023
@satabin satabin deleted the performances/jsonpatch-diff branch March 2, 2023 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants