Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.0.0.0: Migrate to libpcre2 #8

Closed
wants to merge 1 commit into from

Conversation

peteryland
Copy link

Since the old libpcre is now obsolete for some years, migrating to libpcre2 is probably a good idea. This is a mostly-semantic API change that risks catching some people off guard, hence the major-major version number bump. I've kept the changes to a minimal set to maintain as much backward-compatibility as possible.

Fixes: #1

@peteryland
Copy link
Author

Sorry, further testing has revealed this is not yet ready. Please bear with me and I'll update the pull request soon.

@peteryland peteryland force-pushed the master branch 4 times, most recently from 7c0049f to 0429621 Compare February 13, 2025 20:30
@peteryland
Copy link
Author

I've had to do a lot more than I'd expected in the end!

Anyway, I've also added a couple of smoke tests and they seem to be passing. It would be good to add tests with UTF-8 ByteStrings and usage of some options and corner cases where it might differ from the previous versions, but for now I'm just happy that it's working reasonably.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noise

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, ouch. Yeah, sloppy indeed. I'll remove it now.

@phadej
Copy link

phadej commented Feb 13, 2025

I'm not a fan of this change. Let's rather have a new package, regex-pcre2. regex-pcre exists, libpcre is old, but it's not going anywhere for foreseeable future. Don't force us (e.g. Stackage) to migrate all at once. old libpcre and new libpcre2 can coexist, so should Haskell wrappers.

@peteryland
Copy link
Author

I'm not a fan of this change. Let's rather have a new package, regex-pcre2. regex-pcre exists, libpcre is old, but it's not going anywhere for foreseeable future. Don't force us (e.g. Stackage) to migrate all at once. old libpcre and new libpcre2 can coexist, so should Haskell wrappers.

It was indeed something I considered and for sure it would be easy enough to do that. The main reasons I felt it would be better as a new version of this package are:

  • libpcre is very much obsolete
    • it is end of life, and is no longer being actively maintained
    • as a result, it will not be shipped in the next stable version of Debian and its derivatives meaning that any package that depends on this package will also be dropped from Debian and derivatives
      • it is also a blocker for getting HLS into Debian
    • libpcre2 has existed already for 10 years as its replacement
  • probably the vast majority of use cases will not require any changes (although I haven't actually checked this)
    • I would expect that few packages use compile options and exec/match option usage would be very rare
    • in any case, the most common compile options are unchanged in the API
    • the semantic differences are very subtle and affect corner cases only, and in general fix cases where the behaviour didn't previously match Perl's
  • lastly, the haskell tooling allows the coexistence of multiple versions of libraries, and packages can choose when/if they want to build against new versions of dependencies

@phadej
Copy link

phadej commented Feb 14, 2025

lastly, the haskell tooling allows the coexistence of multiple versions of libraries, and packages can choose when/if they want to build against new versions of dependencies

This is not true. You cannot have multiple versions of the same library. E.g. Stackage would be required to migrate at once. (Maybe there aren't many users of regex-pcre and it's doable).

This is the very same reason why debian shipped old and new libpcre for quite some time. Perfectly, regex-pcre2 would existed for 5-6 years.

Abrupt change will be abrupt.

probably the vast majority of use cases will not require any changes (although I haven't actually checked this

Probably every package which uses options will need to be updated, execBlank from matchBlank etc.


But to be fair, most packages use regex-tdfa as it's pure Haskell implementation, i.e. less head ache. IMHO if there are no good reason to specifically use PCRE regexes HSL should use that too.

@andreasabel
Copy link
Member

Thanks for the contribution!

CI is failing... I updated CI on master, so this branch should be rebased to start off with the most recent version of the CI.
E.g.

Wrap.hsc:390:84: error: ‘PCRE2_ALT_EXTENDED_CLASS’ undeclared

If the problem is that additional packages are needed on the CI image, this could be possibly fixed by adding an apt section to cabal.haskell-ci and then rebuild the CI using haskell-ci regenerate (need Haskell-CI from its master).

@andreasabel
Copy link
Member

When it comes to regex-pcre-1.0.0.0 or a new package regex-pcre2, I would usually tend to the second alternative.
(I think it makes sense to subset the bindings regex-pcre together with pcre, and publish a new bindings package for pcre2.)

But maybe the users of regex-pcre should have a say here as well.
On stackage the following users are listed:

@peteryland
Copy link
Author

lastly, the haskell tooling allows the coexistence of multiple versions of libraries, and packages can choose when/if they want to build against new versions of dependencies

This is not true. You cannot have multiple versions of the same library. E.g. Stackage would be required to migrate at once. (Maybe there aren't many users of regex-pcre and it's doable).

Perhaps stackage cannot handle it, but ghc and cabal have no problems with multiple versions of the same lib. That said, I concede that it's better not to have this as it can lead to diamond dependencies.

But to be fair, most packages use regex-tdfa as it's pure Haskell implementation, i.e. less head ache. IMHO if there are no good reason to specifically use PCRE regexes HSL should use that too.

It's not a direct dependency of HLS, but the intermediate dep could be updated indeed.

@peteryland
Copy link
Author

If the problem is that additional packages are needed on the CI image, this could be possibly fixed by adding an apt section to cabal.haskell-ci and then rebuild the CI using haskell-ci regenerate (need Haskell-CI from its master).

Thanks. I'm not super familiar with Haskell CI, but I'll look into it.

@peteryland
Copy link
Author

When it comes to regex-pcre-1.0.0.0 or a new package regex-pcre2, I would usually tend to the second alternative. (I think it makes sense to subset the bindings regex-pcre together with pcre, and publish a new bindings package for pcre2.)

Fair enough. I'll do it that way then.

@peteryland
Copy link
Author

regex-pcre2 is now on Hackage with a PR pending for Stackage. Thanks all for the comments/suggestions.

@peteryland peteryland closed this Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PCRE2 support?
3 participants