Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated references in serializer output #688

Closed
hlg opened this issue Jan 18, 2018 · 7 comments
Closed

Duplicated references in serializer output #688

hlg opened this issue Jan 18, 2018 · 7 comments

Comments

@hlg
Copy link
Member

hlg commented Jan 18, 2018

When serializing to any format, for some entity types, references to other entities are duplicated in the serialization process. For instance, in a STEP serializer, this line from an IFC checkin:

#10 = IFCRELAGGREGATES('1234567890abcdefghijkl', #11, $, $, #12, (#13, #14));

may become this line when downloading as IFC:

#15 = IFCRELAGGREGATES('1234567890abcdefghijkl', #16, $, $, #17, (#18, #19, #18, #19));

At first glance, this seems to be limited to objectified one-to-many relationships, such as IfcRelAggregates and IfcRelDefinesByType. However, not all of these objectified relationships are affected, e.g. IfcContainedInSpatialStructure and IfcRelDefinesByProperties are not. These are just examples, I did not do an exhaustive search for occurrences.

After inspection of the ecore files, I found that the unique attribute of the respective structural feature may be related to the observed errors. For the features where the error occurs, unique is not set, while for the ones where it does not appear it is set to false. Assuming that true is the default value, this would correspond to this upside-down piece of code from the DatabaseSession.readList method:

if (feature.isUnique()) {
list.add(referencedObject);
} else {
list.addUnique(referencedObject);
}
I don't know how the duplication crept in in the first place, but here it seems to be removed, but only when unique is set to false.

In theory, I guess unique should only be set to false for features that reference LIST type collections in Express, while the default true should apply to all features that reference SET type collections in the IFC schema.

@rubendel
Copy link
Member

Do you have a specific combination of deserializer, ifc file and serializer to reproduce this? That would make it easier for me to fix it.

The sole reason for this if statement is performance. Adding to large lists can get extremely slow if you need to do uniqueness checking (for a list this means iterating over the whole thing, for every addition). So that's why add is called for unique items (it needs to check for uniqueness), and addUnique is called if the feature is not unique. addUnique is the method that does not do the uniqueness checking. So I don't think the bug is in the mentioned code, probably somewhere else.

@hlg
Copy link
Member Author

hlg commented Jan 19, 2018

Ok, I see, I interpreted the method names wrong. addUnique does not add the element such that the list is guaranteed to be unique afterwards, but instead it adds an element that is supposed to be unique already. I was just searching where isUnique() is used, because I realized the correlation of the error occurence with the ecore feature attribute.

As for the combination of deserializer, ifc file and serializer: We first encountered the issue with an IFC4 STEP deserializer, IFC4 RV export of the Revit advanced sample with default settings and a custom serializer. I then confirmed it with the same project, but the (non-streaming) IFC4 STEP serializer, and subsequently with the IFC2x3-STEP-deserialized Smiley West sample and the (non-streaming) IFC2x3 STEP serializer.

Following up to your request, I created a minimal sample file (attached below) and did a systematic evaluation, see the following table for the results with deserializer in columns, serializer in rows. It seems that the reference duplication error only occurs for specific combinations, however I found other errors, that is either the collection-type references of objectified relations missing or attributes missing alltogether, including references.

IFC2x3 IFC4
IFC2x3 duplication error references missing
IFC2x3 streaming ok ok
IFC4 references missing duplication error
IFC4 streaming ok ok
JSON attributes missing attributes missing
JSON streaming ok ok

All are STEP serializers and deserializers, no XML involved. I think I used the streaming deserializers, since they are selected by default once the file is uploaded. Hope that helps to find the cause.

duplicaterelations_2x3.ifc.txt
duplicaterelations_4.ifc.txt

@rubendel
Copy link
Member

rubendel commented Jan 22, 2018

IFC2x3 IFC4
IFC2x3 fixed in #688 [1]
IFC2x3 streaming ok ok
IFC4 [1] fixed in #688
IFC4 streaming ok ok
JSON [2] [2]
JSON streaming ok ok

[1] Conversion between schema's at the moment is only done in a really dumb way. I think the only way to come up with a decent conversion both ways is a manual implementation of the conversion. Not something high on my priority list at the moment.

[2] I'll have a look at this later

@hlg About #688. So you were really close :) It's basically the same code, just in a different place where I had to add it. The reason why the duplicates exist in the first place is btw the inverse's EMF is trying to fix right away. There is definitely room for improvement here (performance wise), but the performance of the serializers is not a problem at the moment.

@rubendel
Copy link
Member

Just wondering, is there a specific reason for using the JSON non-streaming serializer? All my projects use the streaming serializer because it's faster and has less impact on BIMserver memory usage (that's actually true for all non-streaming serializers)?

The JSON serializer I am planning to remove in the future if no one comes up with a good reason to use it.

@hlg
Copy link
Member Author

hlg commented Jan 24, 2018

No, I don't really have a reason to use the non-streaming JSON serializer. I evaluated all above serializers including that one to help hunting down the cause of the error.

But indeed I use non-streaming serializers in a custom serializer plugin for a reason: The streaming plugin serializers seem not to be initialized with the plugin configuration. I will open a separate issue for that.

@rubendel
Copy link
Member

That issue (#693) I think has been resolved now. I think removing the JSON serializer will resolve this issue as well, agreed?

@hlg
Copy link
Member Author

hlg commented Jan 26, 2018

Yes, agreed. The original issue was only concerned with the duplication in same-schema non-streaming serializers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants