Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging variants sorting algo #165

Conversation

holofermes
Copy link

No description provided.

@fnaum
Copy link

fnaum commented Mar 24, 2015

Hi,
This week we also merged the variant sorting algorithm into the resources2 branch.
While all the unit tests passed, there was one thing that was not covered by the UT but sort of broke one tool we have on top of rez.
We discovered that there is a small issue with the indexes (that are sorted by the variant solving algorithm) at the time the resolved context, .rxt file is saved.
I'll put the comments in the code.
Fede

num_packages += 1
loaded_variants.append(variant)
# sort all now by version and variant
loaded_variants = sorted(loaded_variants, key=lambda v: (v.version, v.index))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have differences in the code from line 911 to 923
our codes reads as it is on the resource2 branch but with value = sorted(value, key=lambda v: (v.version, v.index)) added before assigning it to entry[1] = value

                       value.append(variant)

                # sort all now by version and variant
                value = sorted(value, key=lambda v: (v.version, v.index))
                entry[1] = value
            variants.extend(value)
            num_packages += 1

with value=[] on line 895

@holofermes
Copy link
Author

Hi,
Actually, I am able to pass all the UT only with setting max_depth = 0, plus there are a few other things I did in this merge, that might be dubious. I think Alan is still taking a look at it.

@@ -1499,6 +1881,9 @@ def _get_solved_variants(self):
for scope in self.scopes:
variant = scope._get_solved_variant()
if variant:
# We changed the index to help the solver pick the 'preferred' variant
# At this point we should add it back for the original index
# variant.index = variant.userdata.variables.get('index', None)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was needed to restore the original index, but by doing here in the new resource2 branch it ends up with some inconsistencies on the .rxt file. So I have now also removed from here and at the end of Solver.solve_step() (line 2139) I made a call to a new function called self._revert_sorted_indexes() which reads

+    def _revert_sorted_indexes(self):
+        # We changed the index to help the solver pick the 'preferred' variant
+        # Restoring the original index from the userdata (resource handle)
+
+        # Get the last (succeeded or failed) phase
+        final_phase = self.phase_stack[-1]
+
+        for scope in final_phase.scopes:
+            variant = scope._get_solved_variant()
+            if variant:
+                variant.index = variant.userdata.get('variables').get('index', None)

@fnaum
Copy link

fnaum commented Mar 25, 2015

The way you are merging it in this PR seems that is reverting some of the changes in resource2.
Do you want me to send just the diff of solver.py (resource2 vs resource2 + variantSorting) ?
or perhaps in my fork update the PR against resource2?

@holofermes
Copy link
Author

I think a diff would work fine, and yes, my merge is a little shabby.
I'm still fresh to the rez world.

Thanks for your comments.

@nerdvegas
Copy link
Contributor

Hey @fnaum, @holofermes,

Just a note on PackageVariant that might help clear things up. The solver code is almost completely separated from the rest of Rez (this is deliberate - I would like to extract it completely one day). PackageVariant is the solver's own copy of the variant data it needs in order to do its thing. "userdata" is just blind data that the solver keep attached to each variant, then ends up giving back in the result. So the solver should DEFINITELY not be using anything from within userdata. Usedata could be None and the solver should still work.

Thx
A

@fnaum
Copy link

fnaum commented Mar 26, 2015 via email

@fnaum
Copy link

fnaum commented Mar 26, 2015

Looks like github blocked the attachments, I've sent you the diff via email

@holofermes
Copy link
Author

Hi,
Thanks for the files.
I definetely made more changes than necessary, but even with your diff, I'm still getting 4 failures from UT:
test_variant_selection_requested_priority_3
test_asymmetric_variant_selection
test_variant_with_antipackage
test_variant_with_weak_packages

Can't upload files from here, so here's a gist of the test's output.

@nerdvegas
Copy link
Contributor

Hey all,

I've just spent a day looking over all this work and trying to come up with something that gives similar, sane variant ordering but with less code (so performance isn't hit too much). I have a contender in the new 'variant_sorting' branch. I think the trick here is to not worry too much about getting the 'most correct' answer in every edge case - because regardless of how good the variant sorting is, it's already been shown that we can never be guaranteed exactly the variants we might expect to get. At best, if we have non-mutually-exclusive variants in a package then we can only hint at which variant we would like - we do need a new feature that allows us to explicitly request a variant.

So, it behaves like so... consider the pkg:

name: bah
- [ foo-1, maya-2016 ]
- [ maya-2015, boost-1.56 ]

Consider the request "rez-env maya boost bah". Variant #1 is the correct choice, because 'maya' pkg has priority, and this variant has the highest version of maya. Looking at the current sorting code I think there was a problem here - in this eg var #2 has the higher 'intersecting weight' and so would have been chosen first.

The rules I'm using are as follows:
(A) create a list of (-position, version-range) for each pkg in the variant that is also in the initial request list;
(B) create a list of version-range for every other pkg in the variant that is NOT in the initial request;
(C) Create the following tuple:
(A, -len(B), B, variant-index)
(D) Use this tuple as the variant sort key.

So in layman's terms we're sorting first by the presence and version of requested packages, then by the number of non-requested packages, then by the versions of non-requested packages, then by the variant index (this is arbitrary - it just gives us repeatable order).

So the two variants above will get the sort keys:
([(0, 2016)], -1, [1], 0)
([(0, 2015), (-1, 1.56)], 0, [])

Some important points-

  1. The variants don't actually need to be sorted until a split() occurs (I've done this here);

  2. Indices don't need to be reordered. The order of variants in the sorter isn't related to the original variants' indexes. So we can actually shuffle variants around inside the sorter to our heart's content - the variant indexes don't really matter (in fact they're only present in the PackageVariant class largely so the debug output can print something readable). This isn't immediately obvious.

  3. I initially "improved" the sorting by having it based on the current request list, rather than just the initial request list. But this isn't actually a good thing - I think that using the current state of the solve to order variants is a bad idea, because it ties us to this solver implementation too much. For example in a SAT solver, it would be impossible to mimic this behaviour. This behavious so so subtle it's hard for any user to understand anyway - I don't think it really gives us anything.

  4. I initially was using a 'positional weight' like the PR too, but then I realised that we're not getting much for the extra cost - this only takes effect for strange variants such as:

  • [foo-5, bah-6]
  • [foo-5, bah-5]
  • [bah-7, foo-5]
    ..in this example it isn't even really clear if preferring 'foo' is correct - there isn't any one clear sorting rule. I've opted to keep it simple - in a request not including foo or bah, the above will result in var2 as first preference, then var0, then var1.

Thx
A

@fnaum
Copy link

fnaum commented Apr 1, 2015

Hi Allan,

Thank you so much to for looking into this. I don’t remember the performance of the variant sorting being a big issue (something like 0.09 sec in a ~170 packages environment) but I'm all for less code and a simpler algorithm, and of course any little gain in performance is desirable if we can still get some “predictable” resolves. I also agree with you that we should not focus on this obscure cases I wrote in the unit tests and make the whole thing simpler.

The algorithm you describe sounds ok to us, but I don’t think we will be able to use it until we have a feature that allows us to explicitly request a particular variant. The way some of our packages are modelled means we need to use the 'intersecting weight' strategy over the the 'highest version' of the packages in the request. For us to be able to stay in sync with 2.0/resources2 more easily would it be possible to make the sorting configurable? Perhaps a configuration option for the solver sorting which is either "intersecting" or "highest"? We understand this is maybe not compatible with future solver changes (SAT) but helps us stay up-to-date while we re-arrange our packages?

On the other hand, does explicitly requesting a variant create a tight coupling between the packages? I guess this is more a problem where we have variants that aren't mutually exclusive but to have that apply across an entire resolve everything would have to explicitly request the specific variant. Currently the sorting is doing that for us.

Regarding your points:

  1. That’s a great point, the second time a looked into this with a little bit more understanding of the solver step I thought that was possible, but I did not want to introduce changes until you and Fabio had taken a look at it.

  2. Good to know that.

  3. We thought about that, but at that point we could not decide whether that was preferable.. With the later point your mentioned about changing the solver in the future, then I agree we should base it on the initial request list.

  4. Agree. Simpler is better, and I don’t think we have such a contrived use case in real life.

Cheers
Fede

@nerdvegas
Copy link
Contributor

Hey Fede,

So RE intersecting weight - are you intersecting only with the original request list, or with the current solver request list? And would intersecting only with the original list suffice? I say because this relates to something I've already raised - that choosing variants based on the current solver state is probably a bad thing.

So just so I understand, you're saying that you guys are definitely relying on intersecting weight taking precedence over highest package version? Ie, in my original example earlier, you need for the second variant to take precedence, even though it has a lower version of maya?

If you need that for now then I think we could add a config option. It only slightly changes the sorting code. I'll wait to hear back from you before doing this just so we both fully understand the situation (I think I do though). I think from a purely theoretical POV that the current sorting is more "correct" (because package order precendence is one of the fundamental rules in the solver) but I understand if you need other behaviour until a manual variant selection feature is added.

Speaking of manual variant selection, I would see this being used mostly by the user, rather than in package requirements (since it does couple package together somewhat), but I also think there will be unavoidable cases where we'll have to use it in package requirements. Whatever gets the job done I guess.

Thx
A

@fnaum
Copy link

fnaum commented Apr 2, 2015

Hi Allan,

We were only intersecting with the original request list not with the current solver req list. I does suffice for us ...at least it seems enough in all our use cases (UT and IT) and production.

Correct, we do rely on intersecting weight taking precedence over highest version. In your previous example the 2nd variant will be the correct for us. We prefer the variant that is more similar to our initial request. Lacking the feature for manually selecting a version, we found this way to hint the solver to try that variant "we want" first. But as you said , if later there is a method to explicitly tell what we want then we can revert back to the more theoretical correct sorting.

That would be great if you can add that config option for us. Thank you so much.

I agree with you, I see the users using manual selection, but I'm pretty sure that if someone building a package sees the 'power' to do that inside a package they will definitely use it to get the job done.

Thanks and have a good Easter break
F

@nerdvegas
Copy link
Contributor

Hey Fede,

I've just pushed an update to 'variant_sorting' branch. To turn on intersecting weight precedence, set this config option:

variant_select_mode: intersection_priority

I haven't included any UTs for this. I'm in two minds about it - now that there are two configurable modes, there are an awful lot of different cases to test - but not much code involved to actually do the sorting. Perhaps if we hit any problems testing this out I'll start adding back some UTs.

Anyway let me know if this behaves the way you expect, if and wen it does I'll merge back to resources2.

Thx
A

@fnaum
Copy link

fnaum commented Apr 8, 2015

Thank you Allan,

We will pull this change and test using our integration tests, if they
pass, that will give us lot of confidence.!!

We'll be back with the results pretty soon as we want to integrate the
whole resource2 branch and make a release of that version.

F

On Fri, Apr 3, 2015 at 10:44 AM, allan johns notifications@github.com
wrote:

Hey Fede,

I've just pushed an update to 'variant_sorting' branch. To turn on
intersecting weight precedence, set this config option:

variant_select_mode: intersection_priority

I haven't included any UTs for this. I'm in two minds about it - now that
there are two configurable modes, there are an awful lot of different cases
to test - but not much code involved to actually do the sorting. Perhaps if
we hit any problems testing this out I'll start adding back some UTs.

Anyway let me know if this behaves the way you expect, if and wen it does
I'll merge back to resources2.

Thx
A


Reply to this email directly or view it on GitHub
#165 (comment).

@fnaum
Copy link

fnaum commented Apr 9, 2015

Hi Allan,

We merged those changes today, and we have a couple of UT and IT failing.

They do fail because the last (arbitrary) criterion of the sorting now is the variant index (arbitrary but repeatable sort order), and before we used to rely in alphabetical order as the last resource .

Consider:

packageA
 - ALboost-1.55.0
 - boost-1.55.0

and

packageB
 - boost-1.55.0
 - ALboost-1.55.0

Where boost and AL_boost are not mutually exclusive, then

rez-env packageA packagesB

will end up with one of each variant and we want both to be the same

packageA-1.0.0     packageA/1.0.0/boost-1.55.0     
packageB-1.0.0     packageB/1.0.0/AL_boost-1.55.0

So, sorting by variant index is arbitrary and give you repeatable solves across runs, but to get the consistent variants when not specified across packages it needs that all packages have the variants written in the same order, ie. first the boost variants then the AL_boost variants

Sorting alphabetically as the last criterion, is also arbitrary, but it gives you consistency between runs and also across variants of the packages regardless the order in which they were written.

Of course, we know that that is a vague question to rez, and if we give more clues to the solver specifying boost or AL_boost then we get consistent variants across packages.

what do you think?

@nerdvegas
Copy link
Contributor

Hey Fede,

Sorry I'm not quite sure from your description if you want to sort on highest package versions THEN alphabetical, or alphabetical THEN highest version? What if we had:

packageA
 - ALboost-1.55.0
 - boost-1.57.0

and

packageB
 - boost-1.57.0
 - ALboost-1.55.0

Are you saying you would want to get the ALboost variant here, even though the boost version is higher, or not?

I've pushed an update that sorts on alphabetical as last resort (higher version takes precedence). Let me know if that gives the results you're after. Note that index is still used as a very last resort - see the docstring on sort_variants().

Thx
A

@fnaum
Copy link

fnaum commented Apr 10, 2015

Hi Allan,

Thank you so much for the quick turn around. This gives us what we are after.

I don't mind if it is higher version and then alphabetical or vice-versa .What we care is that if no package name from the variants is specified in the request and more than one package have the same variants, then it choose the same variant for all of them. In your example above if it chooses boost-1.57.0 for all packages we are good !!

So we are all good now!! Thanks again!

We are going to release this version today and have a handful of people testing with production packages and presets.

Cheers
Fede

PS: This ends up sorting it in reverse alphabetically because line 712, but this is still OK, as I said we care that it takes the same variant if available

@nerdvegas
Copy link
Contributor

Great news! I'll do the merge after your testing goes ahead. Good to have this one out of the way.

I've merged the new string escaping with resources2 today btw too. I think we're starting to get into good shape. Still have those pesky docs to write though.
A

Sent from my iPhone

On Apr 9, 2015, at 10:34 PM, Federico Naum notifications@github.com wrote:

Hi Allan,

Thank you so much for the quick turn around. This gives us what we are after.

I don't mind if it is higher version and then alphabetical or vice-versa .What we care is that if no package name from the variants is specified in the request and more than one package have the same variants, then it choose the same variant for all of them. In your example above if it chooses boost-1.57.0 for all packages we are good !!

So we are all good now!! Thanks again!

We are going to release this version today and have a handful of people testing with production packages and presets.

Cheers
Fede

PS: This ends up sorting it in reverse alphabetically because line 712, but this is still OK, as I said we care that it takes the same variant if available


Reply to this email directly or view it on GitHub.

@nerdvegas
Copy link
Contributor

Hey @fnaum, any movement on this? I'd like to go ahead and merge with resources2.
Thx
A

@fnaum
Copy link

fnaum commented Apr 21, 2015

It is working quite well, a bunch of us have been using it internally in RnD and nobody reported any problem with the way it picks the version. UT and IT are all succeeding.
Even more, today we found an edge case where using the old variant sorting algorithm ended up in infinite loop, but this version handled without a hitch

So I'll say it is gold to be merged on resources2 👍

Thanks a lot
Cheers
Fede

@nerdvegas
Copy link
Contributor

I've just merged 'variant_sorting' branch into resources2, which deprecates this PR, so closing (finally!)

@nerdvegas nerdvegas closed this Apr 21, 2015
@fnaum
Copy link

fnaum commented Apr 22, 2015

Awesome!!
Thanks!

On Wed, Apr 22, 2015 at 5:49 AM, allan johns notifications@github.com
wrote:

Closed #165 #165.


Reply to this email directly or view it on GitHub
#165 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants