Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make IBeatmap DeepCloneable #25467

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

default0
Copy link
Contributor

@default0 default0 commented Nov 15, 2023

Mostly what it says in the title.

This makes IBeatmap implement IDeepCloneable.
Several implementation notes / alternatives:

  • I updated IDeepCloneable to contain a method taking a dictionary so that references can be preserved
  • I added an extension method for convenience of calling it without any arguments
  • Realm objects get detached before being cloned
  • MemberwiseClone is used where appropriate
  • Clone functions are implemented by hand
    • This necessitates updating these if new members get added to the classes in question
    • There are several patterns that are often similar/identical between clone methods, ideas to avoid these repetitions are highly welcome
    • Alternatives include AutoMapper or source generator based cloning, however, there is a lot of nontrivial decision-making about what to clone, what to copy by reference (fe as optimization in case of immutable subtypes, or because certain subtrees are not relevant in the contexts where cloning happens)
      • AutoMapper in principle supports all of these, but would require extensive configuration in a centralized place to have matching behaviour - I consider it to be more appropriate to have the relevant logic in the classes where the decisions are made, as opposed to a centralized place
      • This technically saves on the overhead of initialization of AutoMapper, though I do not consider this particularly important
      • I could not find any off-the-shelf source generators that are feature rich enough to support this, and @smoogipoo mentioned source generators not being performant enough to include in builds, so decided against pursuing this approach further

I do not consider this PR ready for merging, I want it to serve as a basis + prototype for discussing how to deal with cloning of beatmaps

It is notable that while working on this, I noticed lots of places in code that have to very carefully choose which shallow clones to do when to avoid bugs/errors - as well as lots of defensive reloading of data. Without having reviewed these in detail, I suspect a lot of them can be greatly simplified by having a reliable way to create an isolated copy of a beatmap purely in memory.

I have not checked the performance characteristics of this prototype in detail, but could not notice any differences in loading times on my laptop (i7-1360p).

Copy link
Collaborator

@bdach bdach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First vague pass at structure. Seems pretty good. There is some complexity involved but I don't see it being avoidable easily.

Comment on lines +23 to +26
public T DeepClone()
{
return DeepClone(new Dictionary<object, object>());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably would add a remark to xmldoc as to what this is

Comment on lines +8 to +12
public static T DeepClone<T>(this T obj)
where T : class, IDeepCloneable<T>
{
return obj.DeepClone();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe xmldoc the purpose of this one and why it's separate from the interface method.

Comment on lines 135 to 137
Head = hold.NestedHitObjects.OfType<HeadNote>().Single();
Tail = hold.NestedHitObjects.OfType<TailNote>().Single();
Body = hold.NestedHitObjects.OfType<HoldNoteBody>().Single();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Head = hold.NestedHitObjects.OfType<HeadNote>().Single();
Tail = hold.NestedHitObjects.OfType<TailNote>().Single();
Body = hold.NestedHitObjects.OfType<HoldNoteBody>().Single();
Head = NestedHitObjects.OfType<HeadNote>().Single();
Tail = NestedHitObjects.OfType<TailNote>().Single();
Body = NestedHitObjects.OfType<HoldNoteBody>().Single();

};

referenceLookup[this] = clone;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be before the Frames.Select(f => f.DeepClone(...)) call


protected virtual HitObject CreateInstance() => new HitObject();

protected virtual void CopyFrom(HitObject other, IDictionary<object, object> referenceLookup = null)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

referenceLookup does not need to be nullable (will require updating overrides too)

Header = Header?.DeepClone(referenceLookup)
};

referenceLookup[this] = clone;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the other review comment (should be inserted before Header deep clone)

@smoogipoo
Copy link
Contributor

I should have asked before you started going deep into this, but where is this going to be used?

@bdach

This comment was marked as outdated.

@default0
Copy link
Contributor Author

Rereading this, I don't think the details are quite accurate.
The original data flow re stack leniency is something like: EditorBeatmap gets created, but the BeatmapInfo used inside EditorBeatmap is the one from the WorkingBeatmap directly (instead of the playable beatmap used inside EditorBeatmap).

This breaks the stack leniency thing (ie changes to stack leniency do not get applied except when exiting/re-entering the editor).
Updating the EditorBeatmap to instead use the BeatmapInfo from the playable beatmap (and telling it to re-apply beatmap processors on change) fixes stack leniency not updating, but causes any changes to the BeatmapInfo (I usually tested with difficulty / changing circle size) to not get saved when saving the beatmap.

The reason they do not get saved actually has nothing to do with realm: When saving, the data that ultimately gets saved is fetched through the following reference chain: EditorBeatmap -> BeatmapInfo -> BeatmapSetInfo -> Beatmaps -> BeatmapInfo

Normally, it should be the case that BeatmapInfo -> BeatmapSetInfo -> Beatmaps -> BeatmapInfo should be the same (ie ReferenceEquals) as the original BeatmapInfo (when looking at the appropriate beatmap of the set), however the shallow copy created during GetPlayableBeatmap breaks this: Since the BeatmapSetInfo referred to by BeatmapInfo is a shallow (by-ref) copy, while BeatmapInfo itself is a copy, going back to the BeatmapInfo from BeatmapSetInfo actually gives you a different (unchanged) BeatmapInfo.
Since this is the reference path chosen while saving, the changes to the BeatmapInfo made are no longer persisted.
This is currently avoided by passing in the BeatmapInfo from WorkingBeatmap (because there the reference chain is intact, as it should be), however, this has other annoying side effects (such as the stack leniency thing).

It also kind of generalizes, as @bdach already mentioned, having to chase "which specific copy/object is this" in order for certain functions to not straight up break despite the objects themselves being semantically equivalent is not good.

Hence, DeepClone with a mechanism to preserve references ultimately fixes this (it keeps the above reference chain fully intact when creating the PlayableBeatmap) while also providing a much cleaner way to fully isolate the object graph the editor works with.

@peppy peppy self-requested a review November 17, 2023 04:08
Comment on lines +233 to +246
protected virtual HitObject CreateInstance() => new HitObject();

protected virtual void CopyFrom(HitObject other, IDictionary<object, object> referenceLookup)
{
StartTime = other.StartTime;
Samples = other.Samples.Select(s => s.DeepClone(referenceLookup)).ToList();
Kiai = other.Kiai;

// HitWindows is immutable, so we can copy by reference
HitWindows = other.HitWindows;

nestedHitObjects.Clear();
nestedHitObjects.AddRange(other.nestedHitObjects.Select(h => h.DeepClone(referenceLookup)));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno. This is to my eyes nothing short of shit.

We can't expect every ruleset to implement this stuff, let alone get it right.

Case in point: test failures. from missing CreateInstance implementations or incorrect CopyFrom implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. we can try to find simplifications or move to automapper, but the end result will always be that there's annoying complexity around cloning objects. simplifying hitobjects and related objects themselves internally would go a long way in allowing much more straightforward cloning logic. maybe go in that direction before trying to implement it?

return clone;
}

protected virtual HitObject CreateInstance() => new HitObject();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we can't use

Suggested change
protected virtual HitObject CreateInstance() => new HitObject();
protected HitObject CreateInstance() => Activator.CreateInstance(GetType())!;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certain subtypes do not provide parameterless constructors or otherwise require some consideration about how they get constructed during cloning, so this tends to be more straightforward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. For reference, here's a few examples:

SliderEndCircle
StrongNestedHit

Copy link
Member

@peppy peppy Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix tests (and maybe relieve the overhead from a ruleset creator), the base implementation could be left virtual but with a default implementation as I propose. This would fix the countless cases of TestHitObject – and also make a lot of the manual implementations redundant. To handle the edge cases, it could be something like

        protected virtual HitObject CreateInstance()
        {
            try
            {
                return (HitObject)Activator.CreateInstance(GetType());
            }
            catch
            {
                throw new NotImplementedException($"Hit object of type {GetType().ReadableName()} requires constructor parameters but does not override {nameof(CreateInstance)}.");
            }
        }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to avoid reflection altogether because it's not easily AOT-able.

@peppy
Copy link
Member

peppy commented Nov 17, 2023

Regardless of forward direction, I'd like to see this PR passing all tests as a ground truth example that it can work. Some of the current failures are very hard to track back to causation, which is scary to me.

@default0
Copy link
Contributor Author

was planning on fixing the test failures this evening - will report back on how that goes when I've finished

@default0
Copy link
Contributor Author

Alright, I checked the tests.
Some of the failures are just trivial (fe missing null checks in DeepClone implementations), but there are also some much deeper structural issues.

Might be easier to follow with an example, so I will explain using TestSceneHyperDash as my example:
TestSceneHyperDash contains the following assertion:

AddAssert("First note is hyperdash", () => Beatmap.Value.Beatmap.HitObjects[0] is Fruit f && f.HyperDash);

The above assertion fails. However, inspecting the test visually shows the hyperdash functions perfectly fine and correctly. The reason the test fails is because Beatmap.Value is the WorkingBeatmap, and fetching the Beatmap instance off it returns a non-converted & non-processed (ie, only decoded) Beatmap instance.
Since GetPlayableBeatmap() now involves a deep instead of a shallow clone, the changes applied to hit objects on the PlayableBeatmap during ruleset specific beatmap processing/conversion are no longer visible on the WorkingBeatmap.
As a result, the HyperDash is ony applied to the playable beatmap generated inside Player. This playable beatmap currently does not get exposed, so there is no straightforward way to fix it.
Simply exposing it on Player is very confusing imo, as OsuScreen already contains the WorkingBeatmap-Bindable simply called "Beatmap" - and having "PlayableBeatmap" and "Beatmap" both available on a screen is quite strange.

After discussing a bit with @bdach the way forward / cleanest solution to this problem we could come up with is to simply not expose the Beatmap property on WorkingBeatmap and instead force people to use GetPlayableBeatmap() to get properly constructed beatmaps off WorkingBeatmap, as the WorkingBeatmap.Beatmap changing its internal state as a side effect of calling GetPlayableBeatmap() is weird and unintuitive (and DeepClone removing this side effect is what's causing the broken test).

However, pursuing this direction would bloat this PR way too much (because it would require lots of follow-up changes), so instead of adding even more refactoring on top, @bdach suggested (and I agree it makes sense in the name of keeping this PR focused on the cloning) reviving the original shallow clone function and only using DeepClone in the editor for now.
Doing so would also fix the tests that depend on WorkingBeatmap.Beatmap being internally modified when calling WorkingBeatmap.GetPlayableBeatmap() - and properly following through on the distinction between WorkingBeatmap and IBeatmap can then be done in a separate PR at a later time.

@default0
Copy link
Contributor Author

Just to be clear: I will pursue all the individual test failure cases to track down what's causing them - from the ones I got to so far though there seems to be a general theme where there are assumptions about things not being cloned / ReferenceEquals in ways that DeepClone breaks by design.

Since these sorts of things tend to not be straightforward to track down it will take me a bit to work through all of them.
When I am done, I will provide a detailed writeup for all the individual failure cases that I believe to be a failure of general test architecture/design rather than issues with DeepClone itself (similar to the above writeup on TestSceneHyperDash) and fix all the ones that are straightforward or isolated enough to take care of.

@smoogipoo
Copy link
Contributor

I'm not sure that the complexity surrounding cyclic references is worth it - is this actually a problem? Because I would expect manual cloning code to be able to avoid this 100% of the time.

Also I think it's best to keep the structure simple so we're not dealing with 3 new methods (DeepClone, CreateInstance, and CopyFrom), such as:

public class HitObject
{
    public HitObject DeepClone()
    {
        HitObject newObj = (HitObject)this.MemberwiseClone();

        // Adjust reference types in newObj...

        return newObj;
    }
}

At least, this is a pretty standard way of doing it.

The other thing, is I'm not sure this is much better than binary (or otherwise, but probably not JSON) serialising/deserialising hitobjects, which we're going to need to do in some fashion anyway.

@peppy
Copy link
Member

peppy commented Nov 24, 2023

After some discussion IRL yesterday, I think the only way forward here is to start with stand-alone tests that show what is trying to be achieved here.

As in a test which constructs a BeatmapInfo, clones, then points out any issues that led to this change being proposed in the first place. Once we have tests setup to cover expectations, we can then tests other potential candidates instead of the manual approach (serialisation, automapper) and get an idea of what the most maintainable forward solution is.

@default0
Copy link
Contributor Author

Sorry for the long silence on this, been pretty busy for a bit - I expect I should have more time next week, but will have to see.

public class HitObject
{
    public HitObject DeepClone()
    {
        HitObject newObj = (HitObject)this.MemberwiseClone();

        // Adjust reference types in newObj...

        return newObj;
    }
}

At least, this is a pretty standard way of doing it.

It would be nice if the above worked, however, if you have readonly reference types initialized via an object passed into the ctor, the above pattern does not work. I do not remember where exactly off the top of my head, but I am pretty sure I encountered more than one such place throughout the classes that needed to be cloned.
There are also instances where a class being cloned cannot know if it is being cloned in the context of another class that contains it, so even if you use MemberwiseClone() your "// Adjust reference types in newObj..." portion of the code would still require the referenceLookup (to check if any of the MemberwiseClone()-d references needs to be updated to faithfully preserve cycles in the cloned object graph).
I'm down to refactor/simplify those, of course, but I wanted this PR to focus on cloning, not on meddling with the HitObject (and other) model classes.

As in a test which constructs a BeatmapInfo, clones, then points out any issues that led to this change being proposed in the first place. Once we have tests setup to cover expectations, we can then tests other potential candidates instead of the manual approach (serialisation, automapper) and get an idea of what the most maintainable forward solution is.

I have tried to explain above which kinds of reference chains are currently being broken by shallow cloning, but I am down to add tests that showcase these in code, if this is necessary to highlight the problems with the current approach.

I plan to fix all tests first before moving ahead with adding the ones showing off the new behaviour - ideally, I would much prefer if we could simplify the model to the point where straightforward serialization/deserialization or AutoMapper can be used instead of hand-written clone methods, but I do not consider the necessary simplifications/alterations to the model classes to be at all straightforward in the current state.

For the time being, please hold until I can get around to fixing and adding tests :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants