Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Matsumoto's matroid database and AllMatroids() driver function #37140

Merged
merged 4 commits into from
Feb 2, 2024

Conversation

gmou3
Copy link
Contributor

@gmou3 gmou3 commented Jan 22, 2024

This enables the generation of all matroids of certain number of elements (and, optionally, of certain rank or type). There exist restrictions based on the underlying database (Yoshitake Matsumoto's Database of Matroids: https://www-imai.is.s.u-tokyo.ac.jp/~ymatsu/matroid/index.html). Note that the database files consume ~75MB of space, which drops to ~1.5MB upon a standard compression.

Partially addresses #35464 (the type can be set to paving).

📝 Checklist

  • The title is concise, informative, and self-explanatory.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation accordingly.

@gmou3
Copy link
Contributor Author

gmou3 commented Jan 22, 2024

@tscrim

Copy link
Collaborator

@tscrim tscrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. The one other comment to consider would be to have an explicit link/reference to the database in AllMatroids.

src/sage/matroids/database_collections.py Outdated Show resolved Hide resolved
src/sage/matroids/database_collections.py Outdated Show resolved Hide resolved
Copy link

Documentation preview for this PR (built with commit e4e2d24; changes) is ready! 🎉

Copy link
Collaborator

@tscrim tscrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM.

vbraun pushed a commit to vbraun/sage that referenced this pull request Jan 30, 2024
…driver function

    
<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes sagemath#1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->
This enables the generation of all matroids of certain number of
elements (and, optionally, of certain rank or type). There exist
restrictions based on the underlying database (Yoshitake Matsumoto's
Database of Matroids: https://www-
imai.is.s.u-tokyo.ac.jp/~ymatsu/matroid/index.html). Note that the
database files consume ~75MB of space, which drops to ~1.5MB upon a
standard compression.

<!-- Why is this change required? What problem does it solve? -->
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->
Partially addresses sagemath#35464 (the type can be set to `paving`).

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [x] I have created tests covering the changes.
- [x] I have updated the documentation accordingly.
    
URL: sagemath#37140
Reported by: gmou3
Reviewer(s): Travis Scrimshaw
@vbraun vbraun merged commit 357282e into sagemath:develop Feb 2, 2024
20 of 21 checks passed
@gmou3 gmou3 deleted the all_matroids branch February 2, 2024 21:47
@antonio-rojas
Copy link
Contributor

This doesn't work on an installed sagemath because the data files are not being installed. You need to add them to src/setup.cfg.m4 under [options.package_data]

@gmou3 gmou3 mentioned this pull request Feb 3, 2024
3 tasks
@gmou3
Copy link
Contributor Author

gmou3 commented Feb 3, 2024

Thank you. Can you please check the PR I made for this (#37231)? This issue did not appear for me so I can't witness any functional difference.

@tornaria
Copy link
Contributor

tornaria commented Feb 4, 2024

This PR almost doubles the size of src/sage:

$ du -sh src/sage
180M	src/sage
$ du -sh src/sage/ext_data/matroids/
75M	src/sage/ext_data/matroids/

This is not the way databases are handled in sagemath. The normal way would be to make an optional spkg containing the database. The nicer/easier way would be to create a pypi package similar to https://github.com/sagemath/conway-polynomials (btw, in conway-polynomials the actual data files are xz compressed and transparently decompressed as needed)

Also, as @antonio-rojas pointed out, this broke sagemath-standard.

@tornaria tornaria added disputed PR is waiting for community vote, see https://groups.google.com/g/sage-devel/c/IgBYUJl33SQ s: needs work labels Feb 4, 2024
@gmou3
Copy link
Contributor Author

gmou3 commented Feb 4, 2024

I am not familiar with your proposed method of resolving this but I could check it out.

Would compressing the files down to ~1.5MB be satisfactory? Then a decompression (and possibly a deletion of the decompressed file) would need to occur when the function is called.

@tornaria
Copy link
Contributor

tornaria commented Feb 4, 2024

I am not familiar with your proposed method of resolving this but I could check it out.

Have a look at https://github.com/sagemath/conway-polynomials. It's an independent python package so it needs some boilerplate but not too much (you could base your package on this, which is quite modern).

The advantage of this is that the database can be used completely independent of sage, can have its own release schedule, etc. This would be a package very easy to install using pip or create a distro package (it is for conway-polynomials).

What I think would be reasonable is that your package exports a function analogous to the AllMatroids() function (with all the search and/or filter functionality you want to provide) but returning the data as python types so it's usable independently of sage (IMO, ideally the database package returns the data as close to the original source as possible).

Then your AllMatroids() function in sagemath just imports the function from this package and wraps the data in sage types (calling the Matroid() constructor, etc).

Would compressing the files down to ~1.5MB be satisfactory?

I still think this is better handled in a separate package. It may seem small, but you have to bear in mind that sage uses a several databases, some standard and some optional, and if we included every database in the source code, it would all add to a lot, not just space but maintaining work, etc. A separate package is easier to maintain and update independently of sagemath (as long as the interface doesn't change) and can be used without sagemath (this might be useful for someone, I don't know).

Then a decompression (and possibly a deletion of the decompressed file) would need to occur when the function is called.

That's not necessary. The way you use the files, python makes it very easy to switch to reading the compressed file, decompressing on the fly. See the function _open_database() here https://github.com/sagemath/conway-polynomials/blob/master/src/conway_polynomials/__init__.py#L94

Most probably you can copy this function _open_database() almost verbatim, except that since you have several data files you would add a filename argument. This function also uses importlib.resources so the files are placed in the python package itself as package data and you don't have to worry at all about file locations, etc. So really, all you need to do in your code is essentially replace open() by _open_database().

I hope this helps. Maybe @orlitzky has something useful to add (he wrote the conway_polynomials python package).

@gmou3
Copy link
Contributor Author

gmou3 commented Feb 4, 2024

The _open_database() recommendation sounds good. I don't think that a separate package would be of any other use (unless maybe if I also added the decoding function from the revlex encoding to a list of matroid bases).

@gmou3 gmou3 restored the all_matroids branch February 4, 2024 23:31
@gmou3 gmou3 deleted the all_matroids branch February 5, 2024 01:14
@orlitzky
Copy link
Contributor

orlitzky commented Feb 5, 2024

I also think a separate package is a good idea. A big benefit is that it de-couples the download & upgrade procedures of the thing that changes rarely (the database) from the thing that changes often (the code in sagelib). Users might install the database package once, and then upgrade sagelib twenty times before having to update to the database package. It's nice in that example to not have to re-download and re-install the big database those twenty times.

The database link in the original post isn't working for me right now, but another benefit is that a separate package can have a separate license if those restrictions would cause problems for sage itself.

@tornaria
Copy link
Contributor

tornaria commented Feb 7, 2024

@vbraun @mkoeppe what's the way to proceed with this PR?

Is it ok to add 75M to the source code just like that? No trigger warnings? No sage-devel discussion?

This also broke sagemath-standard for a --disable-editable build, and also packages installed via wheels (either from pypi or built from pkgs/sagemath-standard) including distro packages.

The issue is some files not installed, so in-place editable builds work. I guess testing only covers editable builds?

A better way for this is being worked out in #37231, but IMHO this PR should be reverted for the time being.

@mkoeppe
Copy link
Contributor

mkoeppe commented Feb 7, 2024

Is it ok to add 75M to the source code just like that? No trigger warnings? No sage-devel discussion?

I think it was a simple oversight in the review of this PR. I agree that it needs an urgent fix. To me it does not make much of a difference whether to revert the PR or to merge the current version of #37231 (as a hotfix only; not the final state, which should be adding an optional spkg).
I think a more relevant question is: Is it problematic to have the 75M addition in the git history of the develop branch, so should the fix be done by a force push to develop?

$ git clone --single-branch --branch 10.3.beta6 https://github.com/sagemath/sage.git clone-single-branch-10.3.beta6
$ du -sh clone-single-branch-10.3.beta6
549M	clone-single-branch-10.3.beta6
$ git clone --single-branch https://github.com/sagemath/sage.git clone-single-branch
$ du -sh clone-single-branch
625M	clone-single-branch
$ git clone --single-branch --branch ext_data_matroids https://github.com/gmou3/sage.git clone-single-branch-37231
$ du -sh clone-single-branch-37231
547M	clone-single-branch-37231

So -- thanks to the added database's extraordinary compression ratio -- at least in terms of repo size there's not a problem that would warrant a force push. (I don't know if we should be concerned about other potential adverse effects, perhaps on the speed of some git operations.)

@tscrim
Copy link
Collaborator

tscrim commented Feb 7, 2024

We have no policy about adding database-type files, much less anything that is documented in our dev guide. As such, there is no mistake/oversight with the review.

(Side note: We might want to be careful about the definition of a database. What if I write a bunch of Python functions, one for each object?)

The broken builds in certain situations is a problem, but we don’t have any standard testing for this. This is why we have beta releases.

A straight reversion of this PR would also set a bad precedent IMO; it should be handled according to our standard practices of a followup PR. We can also have @vbraun wait for the next release until the issue is settled as we are all trying to figure out the best way forward.

@mkoeppe
Copy link
Contributor

mkoeppe commented Feb 7, 2024

@tscrim It's not like we're trying to charge you a fine for this.

We might want to be careful about the definition of a database. What if I write a bunch of Python functions, one for each object?

Well, adding 75 MB of source code in one PR would certainly also raise some objections.

@tscrim
Copy link
Collaborator

tscrim commented Feb 7, 2024

@tscrim It's not like we're trying to charge you a fine for this.

You’re still trying to lay the blame at my feet for this, which is unfair.

@mkoeppe
Copy link
Contributor

mkoeppe commented Feb 7, 2024

I guess testing only covers editable builds?

@tornaria That's right.

Two things that we might want to do:

@mkoeppe
Copy link
Contributor

mkoeppe commented Feb 8, 2024

You’re still trying to lay the blame at my feet for this, which is unfair.

It's true that your feet should not be blamed for any of this!

vbraun pushed a commit to vbraun/sage that referenced this pull request Feb 11, 2024
    
This is a minimal PR to allow building with python 3.12, given the
circumstances surrounding sagemath#36181.

I've tested this works with system python 3.12 by:
```
./bootstrap
./configure --disable-doc --disable-editable --enable-system-site-
packages
env 'MAKE=make -j36' make
./sage -tp 36 --all
./sage -tp 36 --all --long
```

It only gave the expected failures in
`src/sage/matroids/database_collections.py` from sagemath#37140.

Since scipy 1.12 works ok after sagemath#37123 and it's just a one-liner to
change the required version, I included it here.

No attempt is made at upgrading anything in sage-the-distro.

### 📝 Checklist

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.

EDIT: rebased on top of sagemath#36983 to address reviewer suggestion.
    
URL: sagemath#37270
Reported by: Gonzalo Tornaría
Reviewer(s): Aliaksei Urbanski, Gonzalo Tornaría, Tobias Diez
vbraun pushed a commit to vbraun/sage that referenced this pull request Feb 18, 2024
    
Remove matroid database files (`ext_data/matroids`) and use newly
created `matroid-database` pypi package instead.

Links to `matroid-database`: [pypi](https://pypi.org/project/matroid-
database/), [github](https://github.com/gmou3/matroid-database)

Outdated:
> Compress (via `xz -e`) the matroid database files (~75MB to <1MB). The
files are now opened with `lzma.open()` and the file paths are located
through `sage.ext_data` and not relative to `SAGE_SRC` or
`SAGE_EXTCODE`. These changes attempt to address the issues raised by
@tornaria.
>
> Also, the addition of the matroid database files inside the folder
`ext_data` needs to be noted in the file `setup.cfg.m4`. This issue was
brought to my attention by @antonio-rojas.
>
> These follow the merging of sagemath#37140.
>
> I also took the initiative of reordering alphabetically the list of
paths inside `setup.cfg.m4` and of removing the line `ext_data/images/*`
which seemed redundant.

<!-- Why is this change required? What problem does it solve? -->
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
    
URL: sagemath#37231
Reported by: gmou3
Reviewer(s): gmou3, Gonzalo Tornaría, Matthias Köppe
vbraun pushed a commit to vbraun/sage that referenced this pull request Feb 19, 2024
    
Remove matroid database files (`ext_data/matroids`) and use newly
created `matroid-database` pypi package instead.

Links to `matroid-database`: [pypi](https://pypi.org/project/matroid-
database/), [github](https://github.com/gmou3/matroid-database)

Outdated:
> Compress (via `xz -e`) the matroid database files (~75MB to <1MB). The
files are now opened with `lzma.open()` and the file paths are located
through `sage.ext_data` and not relative to `SAGE_SRC` or
`SAGE_EXTCODE`. These changes attempt to address the issues raised by
@tornaria.
>
> Also, the addition of the matroid database files inside the folder
`ext_data` needs to be noted in the file `setup.cfg.m4`. This issue was
brought to my attention by @antonio-rojas.
>
> These follow the merging of sagemath#37140.
>
> I also took the initiative of reordering alphabetically the list of
paths inside `setup.cfg.m4` and of removing the line `ext_data/images/*`
which seemed redundant.

<!-- Why is this change required? What problem does it solve? -->
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
    
URL: sagemath#37231
Reported by: gmou3
Reviewer(s): gmou3, Gonzalo Tornaría, Matthias Köppe
@tornaria tornaria removed s: needs work disputed PR is waiting for community vote, see https://groups.google.com/g/sage-devel/c/IgBYUJl33SQ labels Feb 19, 2024
vbraun pushed a commit to vbraun/sage that referenced this pull request Feb 22, 2024
    
Remove matroid database files (`ext_data/matroids`) and use newly
created `matroid-database` pypi package instead.

Links to `matroid-database`: [pypi](https://pypi.org/project/matroid-
database/), [github](https://github.com/gmou3/matroid-database)

Outdated:
> Compress (via `xz -e`) the matroid database files (~75MB to <1MB). The
files are now opened with `lzma.open()` and the file paths are located
through `sage.ext_data` and not relative to `SAGE_SRC` or
`SAGE_EXTCODE`. These changes attempt to address the issues raised by
@tornaria.
>
> Also, the addition of the matroid database files inside the folder
`ext_data` needs to be noted in the file `setup.cfg.m4`. This issue was
brought to my attention by @antonio-rojas.
>
> These follow the merging of sagemath#37140.
>
> I also took the initiative of reordering alphabetically the list of
paths inside `setup.cfg.m4` and of removing the line `ext_data/images/*`
which seemed redundant.

<!-- Why is this change required? What problem does it solve? -->
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
    
URL: sagemath#37231
Reported by: gmou3
Reviewer(s): gmou3, Gonzalo Tornaría, Matthias Köppe
@mkoeppe mkoeppe added this to the sage-10.3 milestone Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants