Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C# - Subtree Singletons #3740

Closed
jmcguirk opened this issue Jan 22, 2016 · 8 comments
Closed

C# - Subtree Singletons #3740

jmcguirk opened this issue Jan 22, 2016 · 8 comments
Labels

Comments

@jmcguirk
Copy link

Hey guys,

Reviewing the generated C# bindings - I'm wondering if it makes sense to have references that are references to another table instance be generated only once (on demand), right now everycall generates a new class instance.

Here's a concrete example (edited for simplicity)

Schema File:

table EntityTemplate {
  Components:ComponentSet;
}

table ComponentSet {
   InAppPurchase:InAppPurchaseComponent;
}

table InAppPurchaseComponent{
    Platform:string;
    SkuName:string;
        etc...
}

Generated C#

public partial class ComponentSet : Table {
  public static ComponentSet GetRootAsComponentSet(ByteBuffer _bb) { return GetRootAsComponentSet(_bb, new ComponentSet()); }
  public static ComponentSet GetRootAsComponentSet(ByteBuffer _bb, ComponentSet obj) { return (obj.__init(_bb.GetInt(_bb.Position) + _bb.Position, _bb)); }
  public ComponentSet __init(int _i, ByteBuffer _bb) { bb_pos = _i; bb = _bb; return this; }

  public InAppPurchaseComponent InAppPurchase { get { return GetInAppPurchase(new InAppPurchaseComponent()); } }
  public InAppPurchaseComponent GetInAppPurchase(InAppPurchaseComponent obj) { int o = __offset(46); return o != 0 ? obj.__init(__indirect(o + bb_pos), bb) : null; }
   .. etc
}

Notably this line

public InAppPurchaseComponent InAppPurchase { get { return GetInAppPurchase(new InAppPurchaseComponent()); } }

Means anytime we reference this subtree, we're getting a new instance of InAppPurchaseComponent.

This strikes me as not desirable for a few reasons

1.) It generates a fair amount of garbage. Even though these classes are presumably very light weight, there's still a non zero cost here that gets hit everytime the object subtree is traversed
2.) Prevents accurate equality checks - Asking for the same subtree twice and checking for equality (via reference) returns a negative (I would contend intuitively this should probably be true)
3.) Prevents adding additional data (cached or otherwise transient variables) to be added to the subtrees (we're doing this via partials from other thread)

I'm not familiar enough with the underlying internals to know if this was done for a reason or if the member variable can be cached and returned like so

private InAppPurchaseComponent m_InAppPurchaseInstance;

    public InAppPurchaseComponent InAppPurchase { 
        get { 
            if(m_InAppPurchaseInstance == null){
                m_InAppPurchaseInstance = GetInAppPurchase(new InAppPurchaseComponent());
            }
            return m_InAppPurchaseInstance; 
        } 
    }
@ghost
Copy link

ghost commented Jan 27, 2016

It is intended that you call GetInAppPurchase always if you care about performance.

I guess the reason that isn't stored in a class instance variable for you automatically is because there's potentially a lot of them (especially when you factor in vectors). And now we'd need a mechanism to set all that class back data back to null (or leak that data).

Worse, the above mechanism isn't compatible with processing multiple FlatBuffers, as you'd still be pointing to the first when iterating the second. You could fix this somewhat by always calling GetInAppPurchase even when the object already exists, but that then in turn wouldn't work with concurrent access to 2 buffers (either interleaved, or in threads).

The problem may soon be moot, as we're expecting to replace accessor objects in C# with structs, which should be cheap to generate an copy always.

@jmcguirk
Copy link
Author

Cool - definitely appreciate the background and insight into it.

Our specific use case (and I wonder if this might be the highway use case?) is that its acceptable that the class member variables stay resident in memory after reading as this represents configuration data thats queried on demand. Optimally we'd clean this up, but it represents acceptable loss for us for significantly better usability

I'm trying to wrap my head around this bit:

Worse, the above mechanism isn't compatible with processing multiple FlatBuffers, as you'd still be pointing to the first when iterating the second

Is this roughly what you're describing?

  1. Have a schema defining two tables. A and B. A references B.
  2. Implement the above caching code, when A.B is queried, cache the instance of B.
  3. Open up a flat buffer (X) with a byte[] that represents an instance of A. Query A.B
  4. Open up a different flat buffer (Y) with a byte[] that represents an instance of A. Query A.B -- Result: X's instance of B is returned. Expectation: Y's instance of B is returned.

That is to say, the class instances would themselves be shared across multiple flat buffer reads?

If so - definitely agree that would be a dealbreaker. We happen to have a singleton flatbuffer that contains all our catalog data so potentially less of a concern.

@ghost
Copy link

ghost commented Jan 28, 2016

Yes, that's indeed what I'm describing.

@rw rw added the c# label Jan 28, 2016
@jmcguirk
Copy link
Author

Hmm - I'm wondering if this is actually the case, though its possible I'm not understanding how others are using flatbuffs in practice.

Just taking a look at the code it seems like everytime GetRootAsX is called a new instance is returned, meaning the cached values should be held separate and should remain sane.

Here's a test harness I put together with reference caching code in place including test output

https://gist.github.com/jmcguirk/d7705f6b92ea4c57c045

@ghost
Copy link

ghost commented Jan 29, 2016

There it works because you've made the cached value an instance member as opposed to a class instance member.

To make this work consistently though, you'd need cached copies of all references, including when you have a whole vector of them for example. That's a lot of data we don't really want to store.

@jmcguirk
Copy link
Author

Ah okay - sorry for the confusion. Yeah I think it makes sense to store the cached reference as a member variable.

The cached members should get ejected from memory as soon as the root object is no longer referenced - meaning the only time they're superfluous in memory is when you no longer care about a particular subtree of a root, but still care about another part of that tree - which can be true if your root is long lived and covers a lot of data. This actually happens to be exactly our use case - but, for us, its vastly preferable to managing our own references in calling code.

The vector references are interesting as well - and agree there'd need to be a solution here for there to be a consistent expectation. On our end, we're actually reading the entire vector into a member variable on first query. This was necessary to expedite integration and most of our use cases wound up traversing the entire vector anyways - but I realize this may not match up to the average use case and if the convenience methods were to adopt reference caching it probably makes more sense to cache (lazily) by index rather than create all the instances in one go.

So to sum up - It seems like there's 3 potential options here (2 of which are currently supported)

1.) Use the existing A.B convenience method. Pros: Convenience, easy integration. Cons: GC churn, equality checks, can't cache other member variables via partials.
2.) Use the existing A.GetB(MyInstance) method. Pros: No excess GC generation. Cons: Harder integration path, more boiler plate code to manage references
3.) Use a [New] A.B convenience method that caches references. Pros: Convenience, easy integration, no GC churn, equality checks and caching via partials work. Cons: Dangling references created until root is no longer referenced

In all instances, I agree that #2 yields the best - most predictable performance at the cost of more boiler plate code.

For the convenience methods - our use case definitely favors the semantics of #3 here, but totally understand if its not consistent with your vision for FB

@ghost
Copy link

ghost commented Jan 29, 2016

Yes, I can see there's cases where #3 works well.. but for the sake of keeping the API managable I don't think we should support it (opinions welcome). #1 and #2 cover the spectrum of use cases pretty decently.

@jmcguirk
Copy link
Author

jmcguirk commented Feb 1, 2016

Cool sounds good - we'll maintain our own fork for this behaviour :)

@jmcguirk jmcguirk closed this as completed Feb 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants