-
-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(refactor): refactor toESDocument() to be more efficient #68
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holy 💩! This is huge
Document.js
Outdated
}; | ||
|
||
// remove empty properties | ||
if( _.isEmpty( doc.parent ) ){ delete doc.parent; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it be faster to conditionally add these properties than add and then conditionally remove them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't test it but I think doing something like:
if( !doc.hasOwnProperty( 'parent' ) ){ doc.parent = []; }
would be less efficient as it would need to be executed once per call to addParent()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean don't set parent
when initializing doc
, then doing
if (!_.isEmpty(this.parent) {
doc.parent = this.parent
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_.isEmpty
is somewhat inefficient in this case because it checks for 5 other types of empty conditions before it finally gets to plain old objects. You might see better performance by just checking Object.keys(this.parent).length
since you know it's always going to be an plain old object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trescube arguing AGAINST using lodash? you feelin ok bud?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know! It pained me to type that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agh I see what you mean, to be honest I did it like this so that the key ordering of the output matched the fixtures, I suspect that if we change the key ordering then there will be a big cleanup effort with all the unit tests fixtures across all the importers.
I don't think the delete
call is very expensive so I'm happy to pay that cost in order to avoid having to fix all the fixture files :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could do it like this? it's not so pretty :(
Document.prototype.toESDocument = function() {
var doc = {
name: this.name,
phrase: this.phrase
};
if( !_.isEmpty( doc.parent ) ){ doc.parent = this.parent; }
if( !_.isEmpty( doc.address_parts ) ){ doc.address_parts = this.address_parts; }
doc.center_point = this.center_point;
if( !_.isEmpty( doc.category ) ){ doc.category = this.category; }
doc.category = this.category;
doc.source = this.source;
doc.layer = this.layer;
doc.source_id = this.source_id;
return {
_index: config.schema.indexName,
_type: this.getType(),
_id: this.getId(),
data: doc
};
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do the performance tests for that most recent version you linked as well? i've read (can't find the link now but will look for it) that modifying an object many times like this, rather than setting as many properties as possible when the object is first created, is bad for performance. it would be interesting to see what the actual difference (if any) is
refactor code for performance & readability
this PR removes the slow JSON serialize/deserialize code. that code converts each object to a string and then back in to an object in order to take advantage of a feature in
JSON.stringify
which removes empty properties.it turns out this is potentially slow, so I refactored the code to do the same thing using a different method.
benchmarks:
works out to be ~75% faster, multiplied out to 500 million records, this change could save 10+ hrs