-
-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use matroid-database package #37231
Use matroid-database package #37231
Conversation
Perfect, thanks |
I copy below my comment from #37140. IMHO that PR should be reverted, and replaced by an external database package. Also, I understand that This PR almost doubles the size of
This is not the way databases are handled in sagemath. The normal way would be to make an optional spkg containing the database. The nicer/easier way would be to create a pypi package similar to https://github.com/sagemath/conway-polynomials (btw, in conway-polynomials the actual data files are xz compressed and transparently decompressed as needed) Also, as @antonio-rojas pointed out, this broke |
@tornaria You mean #37140, correct? However, I am not sure how much I favor creating additional optional as they are a pain to install unless you have a source build IIRC. Putting things on pypi currently means it goes out into the either and nobody really knows it exists. We (the Sage community) don't really have a good policy for how deal with small databases. (The pypi issue is a bigger thing for me. I am not aware of any good way to find what's available, and I don't have a lot of trust with that code to stay current with changes to Sage.) In this case, we have a database that is available online but it isn't packaged in any standard form. I agree that this is not the best solution, but without a good way to install optional packages (which would be preference otherwise), I don't see a better option for a somewhat small database like this. Or has something changed with optional package installation? Lastly, I wasn't aware of the deprecation. It does not really seem like a formal deprecation, much less have a plan to replace it. |
Please have a look at the changes. |
Indeed
It should be as simple as
For sage-the-distro is just a (standard or optional) spkg as always. In general, if it's optional it can be handled by "features" which means when you want to use the feature, if it's not available, it will explain what is missing and what is the suggested way to install the feature depending on your installation.
We do. We have a number of databases which are external spkgs, and some work has been started to move them to python packages (I mentioned conway-polynomials, which is a core database which is used to create finite fields, very non-optional I think). The python package is very simple, easy to install in varying circumstances (via pip, as a spkg, as a distro package, conda, etc). Here's the data packages I have installed (I think they are all the standard data packages):
These are all smaller than the matroid database, and still they were deemed to be external packages.
The code would be as a sagemath project in github (e.g. https://github.com/sagemath/conway-polynomials). Having it in pypi means they are easy to install using pip, etc. And sage-the-distro would still have a spkg for this which should be very easy to write and maintain (as are distro packages). Moreover, the databases change very seldom which means all of that data is installed once and doesn't need to change when updating sagemath, etc. In summary, having a database as a spkg is the usual policy for sagemath. Having a database as a "proper" python package in pypi instead of just a custom spkg has some advantages and there would still be a spkg (based on the pypi package instead of custom spkg). Anyway, if you prefer to have a "custom spkg" version of the database, that should be fine as well. The current way is that databases are searched for in a few paths given by
If you want this database to be standard instead of optional, then a vote has to be held in sage-devel. But the policy for databases (either standard or optional) is to have them as separate packages.
No, the idea is that package data should go inside packages and use importlib.resources to access the data. This is for data tied to sage source code. For databases, the policy AFAICT is to have them as separate spkgs. |
I already stated my opinion: I don't think it makes sense to add database files to the repo, even if they are compressed. These files seem to be 10+ years old, we don't need to update them every release (otoh, if they had some bug or were extended, it's nice if they can be updated without having to change sage!) Having an external spkg is the current policy. Making the database "standard" AFAIK requires a vote in sage-devel. Note: even if the spkg is only optional, if the database files are searched via Regarding the compression:
With this the largest files are:
Every other file fits in 4k and the total is:
|
@@ -195,24 +197,20 @@ def AllMatroids(n, r=None, type="all"): | |||
rp = min(r, n - r) if (type != "unorientable") else r | |||
type_file = "all" if (type != "unorientable") else "unorientable" | |||
file = os.path.join( | |||
str(SAGE_EXTCODE), "matroids", "database", | |||
str(SAGE_SRC), "sage", "ext_data", "matroids", "database", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using SAGE_SRC
is not the way to avoid using SAGE_EXTCODE
. The idea of #33037 is that one should not use neither SAGE_LIB
nor SAGE_SRC
, and SAGE_EXTCODE
.
Instead, one is supposed to use importlib.resources
to access package data (which I guess should be placed in the package that owns the data instead of the generic src/sage/ext_data
although I guess you could still access files in there via sage.ext_data
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, using SAGE_SRC
should also be avoided completely because this variable does not have a meaningful definition in the presence of PEP 420 namespace packages.
xz with -e flag (extreme)
Fair enough. If there is consensus on this or you are confident enough of this requirement let me know and I'll try to make it into a package.
That's quite cool. I compressed them all for consistency/simplicity. |
I do not quite understand your objection. Any package on pypi can be installed on any system with python. The package might have a build-dependency or run-dependency on sage, and this is handled by pip. For a purely data+python package such as the one discussed here it would only be run-dependency. As an example, you can try
which is a pure python library with run-dependency on sage. What is a pain in the ass are the external packages that requires a recompilation of the sage source code in order to work. For this not only you need a compiled version of sage but also to recompile part of sage to get it work. And this is a nonsense to package. About advertisement, good projects (as admcycles) tend to survive if people use it and developers maintain it. And I think that dead code in sage is not better than a dead project. |
Let me first start with my proposal going forward. A pypi package for the database could work provided we have a clear mechanism for telling a user what they need to do to use the feature (both in code when trying to do it without installing the database and in the documentation with a link to the database). This is assuming that we don't have an auto-download/install feature, but perhaps that becomes a sysadmin security concern? I am upgrading this to a blocker as I think we all agree that we need to settle this before the next full release. Now let me explain my positions a bit more. @videlec @tornaria Note there are many differences between Sage optional packages and pypi package. Optional Sage packages:
pypi packages:
It isn't about the package run-time dependency on Sage, but the other way around: Sage's run-time dependency on a pypi package. This case might be a good place to start setting a policy about this, or at least one reasonable attempt at one.
I find the argument to be a weak one in general, but a bit stronger for databases. We generally change Sage to push the (minimum) version numbers to not include buggy code. I don't see how we are updating them every release. There are some mild benefits for upgrading one small part independent of our release cycle. Yet we release somewhat frequently and I would tell people just to upgrade Sage.
Who is to judge what counts as dead code? If someone is trying to use it but it is broken, then is it really dead? Frankly, I find this argument completely unconvincing as this doesn't tell someone (who is not in-the-know) what is available. |
Some quick comments:
I believe it was a mistake to merge #37140, at least without a wider discussion. The way forward IMO is to revert it and have the discussion or just add the database as a spkg (pypi or not). Even having the database as a standard package would be way, way better than having it in the source. |
@tscrim This is a solved problem. We advertise packages that are on PyPI by creating an SPKG for it. This puts them in our package index https://doc.sagemath.org/html/en/reference/spkg/index.html#optional-packages See also: |
For @videlec's example package "admcycles", we also have a corresponding SPKG: https://doc.sagemath.org/html/en/reference/spkg/admcycles.html#spkg-admcycles |
Also for reference:
|
A solution with duct-tape and glue at best. I don't see any documentation saying someone should do that (or even encouraging it). An unmarked list interspersed within a huge list contained in an even bigger technical-looking list isn't really advertising. Nothing is said about any automated testing; the only way I am aware of testing on optional packages is by us in the community testing in various combinations. I feel like you are trying to sell me a hamburger without cooking it. |
Really now? |
What about my points is wrong? |
Newly created matroid-database pypi package: https://pypi.org/project/matroid-database/
I think you need to add |
We would use |
That's right. Without declaring the feature, the |
Add PythonModule('matroid_database') Use # optional - matroid_database
Co-authored-by: Matthias Köppe <mkoeppe@math.ucdavis.edu>
src/setup.cfg.m4
Outdated
@@ -107,27 +107,26 @@ sage.repl.rich_output = | |||
|
|||
sage = | |||
ext_data/* | |||
ext_data/kenzo/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to back out the cosmetic change (alphabetic sorting) from this PR because it will conflict with #36951
|
||
License | ||
------- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add license and more detailed description -- for example by fixing the metadata of the package on PyPI and re-running sage --package create matroid_database --pypi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did so manually.
src/sage/features/databases.py
Outdated
@@ -262,7 +262,8 @@ def __init__(self, name='polytopes_db'): | |||
|
|||
|
|||
def all_features(): | |||
return [PythonModule('conway_polynomials'), | |||
return [PythonModule('conway_polynomials', spkg='conway_polynomials', type='standard'), | |||
PythonModule('matroid_database', spkg='matroid_database'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to make this a separate feature class DatabaseMatroids
.
Then, just before you import matroid_database
, use DatabaseMatroids().require()
. This will give instructions to users how to install it.
Documentation preview for this PR (built with commit 7a0a502; changes) is ready! 🎉 |
This is working well. Let's merge it. |
Remove matroid database files (`ext_data/matroids`) and use newly created `matroid-database` pypi package instead. Links to `matroid-database`: [pypi](https://pypi.org/project/matroid- database/), [github](https://github.com/gmou3/matroid-database) Outdated: > Compress (via `xz -e`) the matroid database files (~75MB to <1MB). The files are now opened with `lzma.open()` and the file paths are located through `sage.ext_data` and not relative to `SAGE_SRC` or `SAGE_EXTCODE`. These changes attempt to address the issues raised by @tornaria. > > Also, the addition of the matroid database files inside the folder `ext_data` needs to be noted in the file `setup.cfg.m4`. This issue was brought to my attention by @antonio-rojas. > > These follow the merging of sagemath#37140. > > I also took the initiative of reordering alphabetically the list of paths inside `setup.cfg.m4` and of removing the line `ext_data/images/*` which seemed redundant. <!-- Why is this change required? What problem does it solve? --> <!-- If this PR resolves an open issue, please link to it here. For example "Fixes sagemath#12345". --> <!-- If your change requires a documentation PR, please link it appropriately. --> ### 📝 Checklist - [x] The title is concise, informative, and self-explanatory. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. URL: sagemath#37231 Reported by: gmou3 Reviewer(s): gmou3, Gonzalo Tornaría, Matthias Köppe
Remove matroid database files (`ext_data/matroids`) and use newly created `matroid-database` pypi package instead. Links to `matroid-database`: [pypi](https://pypi.org/project/matroid- database/), [github](https://github.com/gmou3/matroid-database) Outdated: > Compress (via `xz -e`) the matroid database files (~75MB to <1MB). The files are now opened with `lzma.open()` and the file paths are located through `sage.ext_data` and not relative to `SAGE_SRC` or `SAGE_EXTCODE`. These changes attempt to address the issues raised by @tornaria. > > Also, the addition of the matroid database files inside the folder `ext_data` needs to be noted in the file `setup.cfg.m4`. This issue was brought to my attention by @antonio-rojas. > > These follow the merging of sagemath#37140. > > I also took the initiative of reordering alphabetically the list of paths inside `setup.cfg.m4` and of removing the line `ext_data/images/*` which seemed redundant. <!-- Why is this change required? What problem does it solve? --> <!-- If this PR resolves an open issue, please link to it here. For example "Fixes sagemath#12345". --> <!-- If your change requires a documentation PR, please link it appropriately. --> ### 📝 Checklist - [x] The title is concise, informative, and self-explanatory. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. URL: sagemath#37231 Reported by: gmou3 Reviewer(s): gmou3, Gonzalo Tornaría, Matthias Köppe
Remove matroid database files (
ext_data/matroids
) and use newly createdmatroid-database
pypi package instead.Links to
matroid-database
: pypi, githubOutdated:
📝 Checklist