Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Candidates for easy separation from standard library #473

Open
metagn opened this issue Aug 12, 2022 · 19 comments
Open

Candidates for easy separation from standard library #473

metagn opened this issue Aug 12, 2022 · 19 comments

Comments

@metagn
Copy link
Contributor

metagn commented Aug 12, 2022

Because the RFC is long, here is the current progress first:

Current progress

RFC

From #437:

More of the standard library should be moved to external Nimble packages. How far we get with this is unclear.

I am in no position of authority in this community but I thought I would make this RFC to expand on this point as some clarity would be nice.

Why?

There are benefits to moving out certain modules from the standard library, not just for the sake of keeping the core of Nim minimal, but also for the development of the modules themselves. Some of these benefits are:

  • modules can be updated as frequently as required, without needing to be tied to Nim's release schedule
  • multiple Nim versions can be supported at once
  • semantic versioning may be used
  • core Nim developers can be relieved of the responsibility of maintaining these modules
  • even if they are abandoned as a result of being moved out, they are not dead weight

There are many obvious exceptions, such as when they are closely tied to Nim's language development or when they are too popular to feasibly move out.

How?

There are many ways to go about this but only people with authority can make the decisions. The following are general guidelines:

  • these modules should be installable from Nimble, ideally with the same package name to make older code work easier
  • commit history should ideally be kept
  • tests & CI should be adapted, maybe even included in Nim test suite as important packages
  • repositories should ideally be under the nim-lang organization, or seem "official" in some way
  • maintainers should include people who aren't core Nim developers

Backwards compatibility

Assuming the worst case (these modules are removed and they are not distributed with Nim), all most people would have to do is add these modules to their .nimble file/manually run nimble install. While this is basically nothing for most people there might still be call for including them in some kind of distribution, see #476 for more recent discussion about this.

Which modules?

This is the main focus of this RFC.

Some history

There is already a fair amount of past discussion of whether or not to move out certain modules, for example logging in #146.

A "graveyard" repository was created in the past for some functioning modules in the standard library that were not expected to receive maintenance to be moved to.

cgi and cookies were considered to be put in this repository, but this was rejected, as these modules are "used and reasonably free of issues [and] don't cost us much maintenance". oids and smtp were removed at first, but then put back in the standard library, for similar reasons.

In recent times, smtp has had a PR merged in Nov 2021 that seems like it was made just to make smtp work at all for some servers. However this PR is not on a stable release yet as of Nov 2022, nearly a year since, as a specific effort was not made to backport it.

Recently bigints, previously a nimble package, as a compromise to being included in the standard library, has been moved to the nim-lang organization and received a slight uptick in productivity. But it seems the only person who can merge PRs is narimiran and activity has staled after a while despite there seemingly being people interested in contributing.

Metrics

Current independent stdlib modules

The following modules (among others that are too new (like std/with) or just happen to be unused despite being important (like volatile)) in the standard library have no (or close to no) dependents in the Nim repo itself and have self-contained tests:

  • asyncftpclient
  • cgi (used in tests/gc/growobjcrash which might be disabled and only uses cgi.decodeData)
  • complex
  • cookies (used by cgi)
  • critbits
  • db_* modules
    • db_sqlite, db_postgres, db_obdc, db_mysql all depend on db_common
    • db_sqlite is used in a debug block in compiler/sighashes and a single unrelated test, tests/typerel/tnoargopenarray (and a Nim in Action test)
    • db_mysql, db_odbc, db_postgres are tested together in tests/stdlib/tdb
  • associated db wrappers in lib/wrappers (mysql, odbcsql, postgres) except sqlite3
    • sqlite3 is used in nimtracker?
  • punycode
  • lenientops
  • logging (used in tests/async/tlambda)
  • nre (used in tests/stylecheck/taccept)
  • oids
  • parsecsv
  • parsesql
  • rationals
  • re (used by nimgrep, in tests tests/misc/tinvalidnewseq, tests/manyloc/keineschweine, Nim in Action concurrency_regex.nim)
  • smtp
  • stats (used in tests/stdlib/trandom)
  • unidecode
  • varints
  • mersenne (moved to graveyard)

This gives a good blueprint for which modules could be easily moved out.

Usage stats

The following modules that were listed above are the most common based on Nimble usage stats (in parentheses are independent modules that are unlisted above for being too new etc):

  • (with, 73)
  • oids, 33
  • cgi, 29
  • logging, 25
  • complex, 23
  • stats, 22
  • (decls, 16)
  • lenientops, 14
  • critbits, 13
  • (genasts, 12)
  • re, 10
  • cookies, 9
  • varints, 9
  • the rest <= 8

None of these strike me as too popular to move out.

Another metric that could be added here is development activity i.e. commits that aren't simple universal refactorings, but this would be somewhat difficult.

Subjective conclusion

This is my assessment of the modules at hand:

Small modules (<500 lines including docs and whitespace)

These require seemingly little maintenance, and have a noticeable amount of users.

  • cgi
  • complex
  • cookies
  • critbits
  • punycode
  • lenientops
  • oids
  • rationals
  • stats
  • unidecode (small but includes 300 kb data file)
  • varints

These seem to be harmless and it might be annoying to remove them, although they are essentially small Nimble packages. I think there should be no guarantee that these modules are maintained and they should only stay as utility modules. If there was a way to have these in a separate pile from the rest of the standard library, but have them easily accessible anyway, it would be nice.

Medium sized

These may be used in important applications but aren't super common. They occasionally get important patches.

  • asyncftpclient
  • smtp
  • logging
  • parsecsv
  • parsesql

These are the sweetspot when it comes to the friction of moving out IMO, with a niche userbase that probably wants the best version of these modules. For this reason I have made an example demonstration of moved out asyncftpclient and smtp.

parsecsv was only recently made independent in the Nim repo, with the removal of the nimweb tool. It's not as big or often-patched as the rest of these modules either, and it fits with the rest of the parse modules in the standard library, so it might have less priority to move out.

parsesql is kind of in the same boat, but it's a bigger module and more prone to needing patches. There is currently a competing version of it on Nimble that has extra features and doesn't have commit history. Completely unsure what to do with this module.

Big wrappers

These are relatively commonly used, but may not be that hard to move out.

  • re/nre, along with separate package for the pcre wrapper (nre itself was ported from a nimble package)
  • db_* modules, each in their own package with their own wrapper, all having a db_common dependency

These are a bit more effort due to their size and number but moving them out would make the codebase a lot cleaner and wouldn't affect a very large number of users.

Others

Some modules I haven't mentioned yet but I think I've seen discussion about are json and http/ssl modules. These would be the biggest challenge and I'm guessing there's no plans to move them out until an entirely new design is conceived.

@c-blake
Copy link

c-blake commented Aug 14, 2022

rationals and complex might benefit from having their own repos for faster development, but they might be used so commonly that we should keep them in.

To follow-up on this - a stronger analysis of relocation impact would include usage by other nimble packages, not simply usage by the compiler package internally..Something like https://forum.nim-lang.org/t/9323 but at the module level { and where a simple table is all that is needed, not a maybe large image }. It is more work to collect this data since usage is not declared in a .nimble file but is import/include and possibly guarded by when, etc. { Such a dependency analyzer, if it does not already exist, also has more broad value, though. } Each module need not be analyzed at the level of detail in nim-lang/Nim#11430 for one parameter to one symbol, but I do think more detail would be helpful here.

Even this stronger analysis will not cover "dark code" not in the nimble directory. Problem context can guide that sometimes. As 3 examples, A) cgi/cookies seem more prone to "one-off web interfaces" someone might never publish. B) Some people like disruptek just gave up on nimble entirely and have many non-nimble packages that are still public. C) rationals, complex, and lenientops - being "just math" often built into other languages/syntax themselves makes them more appropriate for staying put. This incompleteness caveat having been made, the nimble directory is still a lot better than just the compiler.

It is probably the unwritten plan already, but there should probably be at least a major release cycle like nim-1.8 where the modules are first deprecated only to be moved the next cycle. Then whatever falls between the cracks of the ahead-of-time analysis can perhaps be un-deprecated. The gap between these deprecate-move releases in real-time should hopefully include a lot of word-of-mouth chatter/Forum posts/etc. among users to test any packages they use, and hopefully last at least several months so that more casual users who are also depended upon somehow can respond somehow.

Maybe all of this is super-obvious/implicit in The Plan, but I thought I should say something.

@metagn
Copy link
Contributor Author

metagn commented Aug 14, 2022

there should probably be at least a major release cycle like nim-1.8

I'm not sure if this is going to be done either, but if it had to be, it would be based off the 1.6 branch with commits from devel cherrypicked in. I only know that it's been established that devel is the 2.0 branch.

I was actually going to mention that asyncftpclient has basically no uses on github search (other than Nim repo clones), but it would have been misleading because I've used asyncftpclient in a private repo myself. Though I ended up just copying asyncftpclient into my codebase and making edits on it because it was broken.

These modules definitely aren't unused, them being used is just 1 factor in moving them out of the stdlib. rationals and complex are kind of special because they provide things like a standard Complex type, so they will be a common library dependency. asyncftpclient and smtp are less "abstract" libraries and are usually shallow dependencies/hidden in the implementation, dealing with them is usually as simple as a single nimble file requires addition.

It is still good to get awareness/feedback though. I don't think a Nim 1.8 necessarily will help with this as warnings are often ignored, and it might not be maintained too long until 2.0 or just not enough people will use it. A 2.0 preview maybe, but some breaking changes are hard to warn for, and it's easier to put them in 2.0 directly. A forum post asking what people think about modules moving out might be the best thing to do, I just personally don't involve myself with the forum that much.

I am fine with the idea of putting warnings in 2.0 and removing things in 2.x instead, especially for the standard library. I believe this is the strategy for system removals (warnings haven't even been added yet). std/sums is "deprecated" in devel and the nimble package sums is suggested instead. So maybe something similar can be done for the other modules.

@c-blake
Copy link

c-blake commented Aug 14, 2022

"Often ignored" != "always ignored" and it seems needlessly harsh to punish whatever subset of users does pay attention to warnings with an abrupt change when Nim has nice mechanisms for SW evolution. So, the 2.0 deprecation/2.x removal sounds better to me than just direct move at 2.0, BUT I believe some people are also pretty attached to SemVer notions (as in 2.x changing from deprecation -> hard errors == big breaking change in a minor rev == bad).

So, beats me what the best plan is. I would have said 1.8 deprecated for N >= 3 months except this seems Off Plan from your account, and maybe I am too conservative. Or maybe others were too optimistic in branding devel 2.0 given the status of change-work? @Araq had mentioned "once in a decade" in The Roadmap thread.

Anyway, "how used" more broadly than internal does still seem like useful metadata in decision making, even if it is only 1 factor { and someone may have dep tools like I mentioned :-) }.

@metagn
Copy link
Contributor Author

metagn commented Aug 14, 2022

It's fine to leave versions of them in the standard library for any amount of time then. The key thing is moving them to nim-lang repos. We can do {.warning: "this module may not be up to date, a maintained version of this module is at https://github.com/nim-lang/modulename which can be installed with `nimble install modulename`".}

@hugosenari
Copy link

..Something like https://forum.nim-lang.org/t/9323 but at the module level

Isn't fancy like that but is what I have for now all imports of nimble packages in packages.json.

@c-blake
Copy link

c-blake commented Sep 1, 2022

While a fine start and certainly better than nothing, this looks like simple grep output. E.g., some results show imports within Nim strings that are really Python code.

You might be able to get something a bit stronger by just capturing the "CC: ....foo.nim" logging output of compiles and "uniquifying them" within packages/tests for such/etc. While it might be high effort to even get so much old code to compile, we also probably mostly care only about packages that lib/tool consumers might care about. This happens to translate (roughly & conveniently) to "compiling with low effort".

In a "not really compiling all the way" sense, nimsuggest may either be usable as-is or already be close to a dependency analyzer. Or there may be some other compiler feature somewhere already to just use (like the -M family of options that C compilers possess). Unlike a grep, these more precise ideas will probably ignore when defined(..) guarded use cases, though. Pros & cons. :-)

@haxscramper
Copy link

Ordered usage stats from Aug 2021 #398 (comment)

@Varriount
Copy link

My take is that any library that needs any of:

  • A complete redesign
  • Backwards-incompatible changes
  • Numerous additions

should be split off. They don't necessarily need to be abandoned, but they should be split off into one or more repositories that say "this module is under semantic-versioned development".

@a-mr
Copy link

a-mr commented Sep 1, 2022

SQL stuff seems out of place for stdlib also:
lib / impure / [ db_mysql.nim , db_odbc.nim , db_postgres.nim , db_sqlite.nim ]

@metagn
Copy link
Contributor Author

metagn commented Sep 3, 2022

I have made a repository for smtp as a start/example, preserving commit history using this. If this is fine to move to nim-lang, I can do this for other modules as well.

Edit: parsesql already exists as a nimble package, which appears to be a fork of the stdlib version (without commit history) with new additions. I should be able to add the stdlib commits as well as @bung87's, or we can just use the existing repository without the stdlib commits. bung87's version is also old, there have been some changes since in the stdlib that we would have to add as well with this method.

@metagn
Copy link
Contributor Author

metagn commented Sep 13, 2022

I have updated the smtp port a bit to move the "test" out from the module and try to compile it for CI. If this won't be moved to nim-lang anytime soon then I will make the PR to publish it as a package and the link can be changed later.

I will likely make a repo for asyncftpclient later, maybe parsesql too if I can minimize & apply @bung87's commits. Unsure about logging because the discussion in #146 about it being in the standard library seems leaning towards keeping it in/inconclusive.

The impure packages are difficult: nre was originally a nimble package which is pretty outdated but still indexed, not to mention it uses the pcre wrapper which re uses as well, so moving both of these out would likely also mean moving out pcre to its own package. The db_ modules also have wrappers (see wrapper folder) and they don't seem to be used outside their own modules, but I'm not sure if bundling the wrappers with the implementations is too assuming. There is also db_common in lib/pure which all the modules depend on, one might think this being a "common API" warrants it staying in the standard library (this is a concept that is brought up a lot talking about modules like logging) but there is nothing else like it currently and it is practically possible to just make it a package dependency of all the db_ modules.

Nothing has to be done to these modules in the standard library yet. I intend to submit the packages to the registry for people to use though and uphold their maintenance for the time being (port commits from stdlib, accept bugfix/improvement PRs, add collaborators).

@bung87
Copy link

bung87 commented Sep 13, 2022

The ideal organize db modules is put them into one nimble package with db_common, since parsesql is not full feature and static sql check isn't enabled, parsesql should be another package, once it's ready for static sql check, make it as depency of the db module. there are some issues opened for db libs, not sure what's the right occasion move out db libs.

@metagn
Copy link
Contributor Author

metagn commented Sep 14, 2022

@metagn
Copy link
Contributor Author

metagn commented Sep 17, 2022

Added oids as a candidate, issues like nim-lang/Nim#10604 seem to imply it would benefit from active maintenance independent of Nim versions. It was removed and added back like smtp, but that was because it was added to graveyard, not because it was moved out to its own repository. So I see it as appropriate to list here.

@metagn
Copy link
Contributor Author

metagn commented Nov 25, 2022

Rewrote the RFC to be more readable, the move-out demonstrations are now at https://github.com/nim-old-stdlib where there is also a guide on how to do it (edit: moved to https://gist.github.com/metagn/f808a4b398cece6b8361d57f06efd259). Anyone who wants to keep the repos there updated or make their own repos or transfer the repos is free to ask for permissions in the organization.

@metagn metagn changed the title Candidates for separation from standard library Candidates for easy separation from standard library Nov 26, 2022
@pietroppeter
Copy link

for cross reference, @Araq today mentioned in this forum thread:

The following modules are being moved to Nimble packages:

  • smtp
  • asyncftpclient
  • punycode
  • The db_ subsystem.

(I hope this list is complete.)

No further changes are planned for the time being, it's time to release v2.

@arnetheduck
Copy link

It seems that if packages have a vibrant nimble alternative already (for example because they were forked due to lack of maintenance in stdlib), they would make good candidates for removal as well, else they only create confusion and poor UX for newcomers to Nim (that inevitably will have to go through the same disappointment as the ones doing the forking)

@mratsim
Copy link
Collaborator

mratsim commented Jan 10, 2023

#501 (comment)

Sorry about that but it's not all about the "overhead of an if statement". More code and more features and more code paths add complexity to a codebase. And complexity has already slowed down Nim's development to the point of harming its future.

So I have to say "no" to almost any new feature or API addition.

What would help? Moving async to its own Nimble package might

@metagn
Copy link
Contributor Author

metagn commented Aug 28, 2023

Some updates not covered above:

The checksums modules (md5, sha1) were also moved out, meaning the compiler depending on certain modules is not really an issue anymore.

There also might be a new candidate, htmlparser:

At the very least not both of htmlparser and fusion/htmlparser should exist, and probably the best solution by a margin is moving into a package.

Edit: Seems to have been moved out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants