Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use get or create to reduce the number of objects created #58

Closed
wants to merge 1 commit into from

Conversation

singh1114
Copy link
Collaborator

Fixes #40

Signed-off-by: Ranvir Singh ranvir.singh1114@gmail.com

Copy link
Collaborator

@haikoschol haikoschol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@singh1114 #40 says "use update_or_create" but in this PR you use get_or_create. Can you explain why? Even better would be if you can edit #40 to explain what the problem is that using either of these QuerySet methods is meant to solve. (nevermind, that is already explained by the comment by @pombredanne)

@pombredanne
Copy link
Collaborator

@singh1114 Thank you for this!
do you mind to rebase your PR? I merged first @haikoschol 's and there is a merge conflict now.

Copy link
Collaborator

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good otherwise, though do you think there could be a unit test that would show that there were issues with a bare create before?

@singh1114
Copy link
Collaborator Author

@pombredanne Did that!

@singh1114
Copy link
Collaborator Author

This is looking good otherwise, though do you think there could be a unit test that would show that there were issues with a bare create before?

I think we already had tests which specified the number of objects being created. Reducing the count in the tests itself will account for the same thing.

Fixes #40

Signed-off-by: Ranvir Singh <ranvir.singh1114@gmail.com>
@haikoschol
Copy link
Collaborator

@singh1114 I still don't see how this PR fixes #40. Did you mean to reference #28, which is about duplicate data?

Either way, I think the import process needs to be redesigned (and called "import" instead of "data dump", but let's not bikeshed about that just now :). Apart from issues #28 and #40, the current process is also not scalable, already way too slow, and it will only get worse by using either get_or_create or create_or_update.

We should distinguish between importing historical data once, when an instance of vulnerablecode is provisioned, and continuously importing updates from vulnerability sources. Otherwise the time for the latter process will continue to grow with the amount of data.

@singh1114
Copy link
Collaborator Author

@haikoschol Sorry meant to refer #28 for the duplicate data. #40 doesn't seem to be correct and is a duplicate to #28

Agreed with your view of differentiating both the procedures.

For improving the speed of the dump, we can use bulk_create (and it would be good to call it dump if we use that but handling duplicates in that case will be more difficult).

Anyway, we will have to make sure the data in the DB is unique.

@haikoschol
Copy link
Collaborator

@singh1114 I agree that we should use bulk_create, at least for the initial import, but probably for updates as well.

However, I don't think that #40 is a duplicate of #28. I suggest we continue the discussion what #40 is about directly on the issue instead of this PR.

@pombredanne
Copy link
Collaborator

At this stage and based on the discussions here and the changes that were applied, I think it is best to close and continue the discussion in #40

@pombredanne pombredanne deleted the update_or_create branch January 27, 2020 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace objects.create with objects.update_or_create.
3 participants