Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate PQS performance #271

Open
gotmachine opened this issue Oct 18, 2024 · 0 comments
Open

Investigate PQS performance #271

gotmachine opened this issue Oct 18, 2024 · 0 comments
Labels
kspPerformance Possible performance improvement in KSP

Comments

@gotmachine
Copy link
Contributor

gotmachine commented Oct 18, 2024

PQS updates are relatively light in terms of impact on the average frame time.

Still, there are caveats :

  • Every frame, the subdivision level of every quad is checked, this is relatively fast on stock bodies (less than 0.5% of the frame time), but will increase exponentially with the max subdivision level, which is often the case on modded bodies, typically on larger than stock scale systems, but also when higher vertex density is desired (for example with parallax or when high resolution heightmaps/textures are used).
  • Subdivision itself is pretty slow. Subdivision typically doesn't happen every frame, even when flying close to the ground at very high speeds, so this doesn't show much on the average frame rate, but is a notable source of framerate instability. 1% lows when flying fast can go up to 10ms per frame on stock bodies (on my 5800X3D, figures are likely much worse on weaker CPUs). There is a rather crude work-per-frame limiting mechanism to avoid things getting out of hand, set at 75ms per frame.

On the fixed cost of checking subdivision level (so excluding the work done for subdividing/collapsing quads), there are a few things that could be done :

  • PQ.UpdateTargetRelativity() is called on every quad to compute the target distance (usually the active vessel) from the quad, which in turn is used to decide if the quad should be subdivided, collapsed or left in its current state. The check doesn't actually compute the distance, but the arc-length, which involve a rather slow Math.Acos() call. Computing that arc-length is roughly 30% of the fixed PQS update time. Using the arc-length might be a not so important refinement in practice ? Checking against a simple squared distance would be several orders of magnitude faster...
  • The work-per-frame limiting logic involve calling Time.realtimeSinceStartup on every quad update, to check against the max time allowed. This call is extremely slow alone is another 30% of the fixed PQS update time. Replacing it by a call to Stopwatch.GetTimestamp() is only very marginally faster, the two methods are going through the same native call stack anyway. Maybe we could be a bit sneaky here, and initially measure lets say a hundred times how much time a subdivision takes, store that metric and use it to increment a simple counter. It wouldn't be as precise, but probably good enough for the intended purpose.
  • A remaining 7% of the time is spent repeatedly setting the enabled property of the quad MeshRenderer which in 99+% of cases, will already be in the right state. It is surely feasible to track that state independently, only setting the property when it has changed.
  • As usual, another 7% checking the destroyed state of unity objects.
  • And finally, loops over fixed arrays of 4 items. Ideally, these arrays would just be fields, but at the very least we should access the array 4 items directly instead of using a loop.

Overall, this is around 75% of the fixed cost that could be more or less eliminated by addressing those.

On the subdivision side of things :

  • PQS.BuildTangents(PQ quad) is getting/instantiating the whole mesh normals array on every iteration (ie, for every vertex). Around 15% of the subdivision time and could be eliminated trivially.
  • It kinda depends on which PQSMods are configured, but roughly 50% of the subdivision time is spent evaluating 3D simplex noise. I've been trying to micro-optimize the stock simplex implementation, but only managed to made it about 20% faster. The real answer here is likely a vectorized / burst implementation, but it would likely be hard to ensure results are numerically identical to the stock ones, although I'm sure how needed that is practice.
  • Other types of noise functions are used in various PQSMods, and they always represent the overwhelming majority of the work done by those PQSMods. They are implemented in a C# port of libnoise, from a quick look they look decently optimized already, but I guess a micro-optimization pass could still improve things marginally. Main noise functions used are Voronoi and RidgedMultifractal.
  • There are some micro-optimization opportunities in the texture/heightmap sampling functions but overall they represent a tiny fraction of the work. One exception is GetPixelCBTextureAtlasPoint(), which represent 12% of the subdivision time, and could definitely be optimized. Likewise, the PQS.BuildNormals(PQ quad) method could benefit from a micro-optimization pass and represent 2.5% of the work.
  • There are likely more optimizations opportunities in various other methods, but identifying the relevant ones would require more profiling. Profiling modded bodies would be nice too, as they usually don't use the same PQSMods as stock.

There are a few practical difficulties for implementing all this :

  • Many of the optimization opportunities are in relatively small methods called many times. The overhead of harmony-patching will in some cases completely erase the gains.
  • Especially on the fixed subdivision level checks optimizations, it would be necessary to maintain additional/different data. Having that data on the side would involve additional overhead and messy lifecycle tracking.
  • One partial solution to both these problems would be to implement our modifications in derived classes and swap the instances at runtime. This might not be feasible in some cases for modding ecosystem compatibility reasons.

A (very) strech goal would be to parallelize quad subdivision. The general process definitely lend itself to parallelization, but in practice given the current architecture, this feel quite difficult to accomplish. And there is the issue that while we could re-implement the stock PQSMods to be thread safe, there are many PQSMods defined in plugins that won't.

@gotmachine gotmachine added the kspPerformance Possible performance improvement in KSP label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kspPerformance Possible performance improvement in KSP
Development

No branches or pull requests

1 participant