Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alphabetic ordering of packages #491

Closed
2 tasks done
baszalmstra opened this issue Aug 28, 2023 · 10 comments
Closed
2 tasks done

Alphabetic ordering of packages #491

baszalmstra opened this issue Aug 28, 2023 · 10 comments

Comments

@baszalmstra
Copy link

baszalmstra commented Aug 28, 2023

Checklist

  • I added a descriptive title
  • I searched open requests and couldn't find a duplicate

What is the idea?

I noticed that conda-lock.yml files are topologically sorted.

I was wondering why this was chosen over alphabetic sorting. It feels to me that alphabetically sorting the packages makes the format more stable compared to topological sorting. If a package version changes and it adds or removes a dependency this might change the order of packages in the format resulting in a large chance when viewing the diff.

Why is this needed?

It would be ideal that small changes of the lock file would also result in small changes in a diff/patch. Large changes cause reviews to skip reviewing lock-file changes while the small chance might be significant.

Since the dependencies of packages are also stored, the topological sorting of packages can relatively easily be reconstructed from the file itself (Mamba already does this, rattler doesn't need it).

What should happen?

Instead of sorting the packages in the file topologically we sort them by name, by platform, by version. I think this will result in the smallest possible diff when:

  • A package changes version
  • Adding or removing a package

Which I think are the most common causes of change.

Additional Context

Would love to hear your thoughts! :)

@ismail
Copy link

ismail commented Aug 28, 2023

The main problem with the current setup is that when you add a new package the lock package is shuffled, this makes git diff unusable. Usually, the diff is so big that GitHub won't show it unless you manually load it. Of course, even if you load it manually, making sense of the changes is not easy at all. I'd say there is a usability angle to this, but also a security angle.

With an alphabetically sorted setup, it's a breeze to audit the lock file changes. I hope you consider and accept this proposal.

@maresb
Copy link
Contributor

maresb commented Aug 28, 2023

I suspect that this is an artifact of the old explicit format, where the package URLs are listed without the dependencies. And because the dependency info is not included with the explicit format, the only way to convey the correct installation order is to topologically sort the dependencies beforehand.

With the new format, since topological sorting is fast and easy, I see no reason to do it beforehand. (@mariusvniekerk, please correct me in case I'm missing something.) I'd be happy to consider a PR. (I think it should be really trivial.)

@FelixSchwarz
Copy link

I think this was implemented in 70642a8 with a few related tickets: #181, #184, #170

However I also would like to see alphabetical sorting - this would make reviewing diffs much easier.

@maresb
Copy link
Contributor

maresb commented Sep 29, 2023

Thanks a lot @FelixSchwarz for looking through the history. It really seems like toposort was chosen simply to fix the package order.

It seems to me like as long as we ensure that explicit lockfiles are toposorted, then there should be no problem with sorting the unified YAML format alphabetically.

Also, when toposort is computed breadth-first like here, then we could easily track the toposort_depth as a nonnegative integer. Then we could trivially recreate the current hybrid topological/lexicographical ordering by sorting on the key (toposort_depth, name). (To be explicit, toposort_depth==0 for packages with no dependencies, toposort_depth==1 for packages which only depend on packages with no dependencies, and so on.)

@mariusvniekerk
Copy link
Collaborator

mariusvniekerk commented Oct 11, 2023

So one thing that is done with the sorting is that it explicity uses the same topographical sort as conda so that the behavior for ties in topographical orderiung is the same as for conda. That was largely why this was done. The format does likely contain enough information to perform this sort on-demand now

@mariusvniekerk
Copy link
Collaborator

It should be relatively simple to perform alphabetic (maybe (platform, manager, package)) ordering on serializing the Lockfile to disk and to just call

def toposort_inplace(self) -> None:
self.package = self._toposort(self.package)
after loading the Lockfile from disk

@maresb
Copy link
Contributor

maresb commented Oct 12, 2023

We should agree on an ordering of the keys.

I actually had in mind (package, manager, platform), the idea being that if one package changes, then it will typically change similarly for each platform, and then you'd see those similar changes grouped together.

Putting platform first is more in agreement with how conda-lock works internally, but in terms of reviewability of the lockfile it'd mean that a change to a package would be spread across several locations (one for each platform) in the lockfile. And I think this issue is about increasing reviewability.

@baszalmstra
Copy link
Author

With categories, is it possible to have multiple versions of the same package? Then we also have to sort by category/version/build.

@maresb
Copy link
Contributor

maresb commented Oct 12, 2023

@baszalmstra, nope, there's a single solution having fixed versions over all categories. Then categories allow for selecting a subset of packages within that single solution.

@maresb
Copy link
Contributor

maresb commented Nov 20, 2023

Thanks @baszalmstra for the excellent suggestion! This is now available in v2.5.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants