Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cabal 3.10.2.1 cannot parse pkg-config: hGetContents: invalid argument (invalid byte sequence) #9608

Closed
elldritch opened this issue Jan 10, 2024 · 21 comments · Fixed by #9609

Comments

@elldritch
Copy link

Describe the bug

Trying to run cabal build on https://github.com/fossas/fossa-cli/tree/64c2f202dc923e3784267cf1d974a952da5baf09 works with Cabal 3.6.2.0, but not with Cabal 3.10.2.1. Here's the error message:

$ cabal build --minimize-conflict-set --verbose
this build was affected by the following (project) config files:
- /home/leo/src/fossa/fossa-cli/cabal.project
Running: /home/leo/.ghcup/bin/ghc --print-global-package-db
Reading available packages of hackage.haskell.org...
Using historical state as of 2024-01-02T10:02:41Z as explicitly requested (via
command line / project configuration)
index-state(hackage.haskell.org) = 2024-01-02T10:02:41Z (HEAD =
2024-01-10T22:03:12Z)
Running: /usr/bin/pkg-config --version
Running: /usr/bin/pkg-config --variable pc_path pkg-config
Running: /usr/bin/pkg-config --version
Running: /usr/bin/pkg-config --list-all
Failed to query pkg-config, Cabal will continue without solving for pkg-config
constraints: output of /usr/bin/pkg-config: hGetContents: invalid argument
(invalid byte sequence)
Resolving dependencies...
CallStack (from HasCallStack):
  withMetadata, called at src/Distribution/Simple/Utils.hs:368:14 in Cabal-3.10.2.1-inplace:Distribution.Simple.Utils
Error: cabal: Could not resolve dependencies:
[__0] trying: spectrometer-0.1.0.0 (user goal)
[__1] trying: lzma-0.0.1.0 (dependency of spectrometer)
[__2] rejecting: lzma:+pkgconfig (conflict: pkg-config package liblzma>=5.2.2,
not found in the pkg-config database)
[__2] rejecting: lzma:-pkgconfig (manual flag can only be changed explicitly)
[__2] fail (backjumping, conflict set: lzma, lzma:pkgconfig)
After searching the rest of the dependency tree exhaustively, these were the
goals I've had most trouble fulfilling: lzma (8), spectrometer (7),
lzma:pkgconfig (3)

This is strange, because /usr/bin/pkg-config --list-all exits zero, and /usr/bin/pkg-config --modversion liblzma outputs 5.4.5. Here are some pkg-config outputs:

$ /usr/bin/pkg-config --version
2.1.0

$ /usr/bin/pkg-config --variable pc_path pkg-config
/usr/lib/pkgconfig:/usr/share/pkgconfig

I have also the output of /usr/bin/pkg-config --list-all, available here: pkg-config-list-all.txt.

I'm on Arch Linux (which symlinks pkg-config to pkgconf), and I suspect this may be related to #8930, which also impacts me. I've been having troubles with pkg-config in general (see #8495 and #8494) since Cabal 3.8 when the pkg-config logic was changed. The commit in this bug report also does not build on Cabal 3.8.1.0 for me, failing with the same error message.

To Reproduce

cabal build on https://github.com/fossas/fossa-cli/tree/64c2f202dc923e3784267cf1d974a952da5baf09 using Cabal 3.8.* or 3.10.* reliably reproduces this issue for me, although it may be because of my machine's specific pkg-config database.

Expected behavior

I expect the build to succeed building in Cabal 3.10, since it succeeded in Cabal 3.6, and my pkg-config database clearly has the required package.

System information

  • Operating system: Arch Linux
  • Cabal:
    • Tested with and affects: 3.8.1.0, 3.10.2.0, 3.10.2.1
    • Tested with and does not affect: 3.6.2.0-p1
  • GHC: 9.4.7
@elldritch
Copy link
Author

Hmm, given the invalid byte sequence, I wonder if this issue is because of:

$ /usr/bin/pkg-config --list-all | grep --text vpl       
vpl                            Intel� Video Processing Library - Accelerated video decode, encode, and frame processing capabilities on Intel� GPUs

I'm not sure what to do fix this though. I can't uninstall libvpl, because it's a transitive dependency of ffmpeg, and that breaks Zoom and Firefox, among others.

@geekosaur
Copy link
Collaborator

What does locale say? The substitution characters there make me suspicious, and perhaps cabal should be forcing UTF-8 there.

@elldritch
Copy link
Author

Is this what you're looking for?

$ locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=

I've been staring at the package in the meantime (see package and PKGBUILD and upstream), and I think the symbol is a "®" symbol. I'm trying to figure out if this is a Cabal problem or a "this package is malformed" problem.

@elldritch
Copy link
Author

Yup, I think I'm newly encountering this problem because of a branding change upstream here which was just packaged into Arch Linux yesterday (see package last updated time).

@geekosaur
Copy link
Collaborator

UTF-8 locale. I wonder why pkg-config is outputting those substitution characters, then.

@elldritch
Copy link
Author

elldritch commented Jan 10, 2024

Yeah, running LANG=en_US.UTF-8 cabal build and LANG=C.UTF-8 cabal build with Cabal 3.10 doesn't work either. Setting LC_ALL=en_US.UTF-8 doesn't help either.

@geekosaur
Copy link
Collaborator

The invalid Unicode is in the upstream vpl.pc.in on GitHub, it turns out.

@elldritch
Copy link
Author

I don't know enough about pkg-config to know here: is the problem that upstream has incorrectly added Unicode into the vpl.pc.in? Or is that actually allowable in .pc files, and the problem is that Cabal needs to parse those characters?

@geekosaur
Copy link
Collaborator

geekosaur commented Jan 10, 2024

They introduced invalid UTF-8 characters into the .pc file, and the Haskell standard library is throwing a decode error when it encounters them that causes cabal to abort.

@elldritch
Copy link
Author

elldritch commented Jan 10, 2024

I've opened an issue with upstream at intel/libvpl#117. I'm pretty sure upstream is at fault here, since man file.pc says "Properties are set using RFC822-style stanzas which consist of a keyword, followed by a colon (:) and then the value the property should be set to", and RFC-822 says that field bodies must be ASCII characters, and "®" is not an ASCII character.

I'll leave this open in case maintainers decide they want to support parsing Unicode characters from pkg-config output, but feel free to close it.

@geekosaur
Copy link
Collaborator

We do parse Unicode; the problem is the characters aren't Unicode. I wouldn't be surprised to find they're ISO8859-1.

@elldritch
Copy link
Author

elldritch commented Jan 11, 2024

Yeah, I think it is. hexdump -C /usr/lib/pkgconfig/vpl.pc says that both characters are ae, which is exactly the (R) symbol in ISO 8859-1 (table).

EDIT: Ah, and also:

$ file --mime /usr/lib/pkgconfig/vpl.pc 
/usr/lib/pkgconfig/vpl.pc: text/plain; charset=iso-8859-1

Okay, will close this issue. Thanks for the assist!

@Bodigrim
Copy link
Collaborator

It would be more robust if Cabal parsed the file as binary ByteString: I imagine we do not care at all about freeform package descriptions, only the first word of each line.

@geekosaur
Copy link
Collaborator

I'm reopening this based on @Bodigrim's comment.

@geekosaur geekosaur reopened this Jan 11, 2024
@gbaz
Copy link
Collaborator

gbaz commented Jan 11, 2024

we call getProgramOutput which in turn calls getProgramInvocationOutput.

We could switch to getProgramInvocationLBS instead, should be a good first ticket for anyone that wants to pick it up.

https://downloads.haskell.org/ghc/latest/docs/libraries/Cabal-3.10.2.0-f1f5/Distribution-Simple-Program-Run.html#v:getProgramInvocationLBS

tomsmeding added a commit to tomsmeding/cabal that referenced this issue Jan 11, 2024
Previously, if any of the pkg-config packages on the system had invalid
Unicode in their description fields (like the Intel vpl package has a
the time of writing, 2024-01-11, see haskell#9608), cabal would crash because
it tried to interpret the entire `pkg-config --list-all` output as
Unicode.

This change, as suggested by gbaz in
  haskell#9608 (comment)
switches to using a lazy ByteString for reading in the output, and
splitting on the first space in byte land, and then parsing only the
package _name_ to a String.

For further future-proofing, package names that don't parse as valid
Unicode don't crash Cabal, but are instead ignored.
tomsmeding added a commit to tomsmeding/cabal that referenced this issue Jan 11, 2024
Previously, if any of the pkg-config packages on the system had invalid
Unicode in their description fields (like the Intel vpl package has at
the time of writing, 2024-01-11, see haskell#9608), cabal would crash because
it tried to interpret the entire `pkg-config --list-all` output as
Unicode.

This change, as suggested by gbaz in
  haskell#9608 (comment)
switches to using a lazy ByteString for reading in the output, and
splitting on the first space in byte land, and then parsing only the
package _name_ to a String.

For further future-proofing, package names that don't parse as valid
Unicode don't crash Cabal, but are instead ignored.
tomsmeding added a commit to tomsmeding/cabal that referenced this issue Jan 11, 2024
Previously, if any of the pkg-config packages on the system had invalid
Unicode in their description fields (like the Intel vpl package has at
the time of writing, 2024-01-11, see haskell#9608), cabal would crash because
it tried to interpret the entire `pkg-config --list-all` output as
Unicode.

This change, as suggested by gbaz in
  haskell#9608 (comment)
switches to using a lazy ByteString for reading in the output, splitting
on the first space in byte land, and then parsing only the package
_name_ to a String.

For further future-proofing, package names that don't parse as valid
Unicode don't crash Cabal, but are instead ignored.
@bezirg
Copy link

bezirg commented Jan 11, 2024

Thank you for this. I am downgrading vpl for now.

Mikolaj pushed a commit to tomsmeding/cabal that referenced this issue Jan 15, 2024
Previously, if any of the pkg-config packages on the system had invalid
Unicode in their description fields (like the Intel vpl package has at
the time of writing, 2024-01-11, see haskell#9608), cabal would crash because
it tried to interpret the entire `pkg-config --list-all` output as
Unicode.

This change, as suggested by gbaz in
  haskell#9608 (comment)
switches to using a lazy ByteString for reading in the output, splitting
on the first space in byte land, and then parsing only the package
_name_ to a String.

For further future-proofing, package names that don't parse as valid
Unicode don't crash Cabal, but are instead ignored.
@georgefst
Copy link

georgefst commented Jan 15, 2024

I am downgrading vpl for now.

I'm not much of an Arch expert, I don't know much about what these packages do or why libvpl has replaced onevpl, and I can't guarantee that this won't break anything, but I believe the simplest way to do this is:

sudo pacman -U https://archive.archlinux.org/packages/o/onevpl/onevpl-2023.4.0-1-x86_64.pkg.tar.zst

@elldritch
Copy link
Author

elldritch commented Jan 16, 2024

libvpl replaced onevpl because upstream (Intel) renamed the package, and in doing so they added in a non-ASCII character for the trademark symbol that caused this issue.

I fixed this the lazy way on my machine by just editing /usr/lib/pkgconfig/vpl.pc directly on my machine and removing the (R) from the Name and Description fields.

@dcoutts
Copy link
Contributor

dcoutts commented Jan 22, 2024

Another thing that may be worth trying: it looks like pkg-config also has a flag --list-package-names which should avoid emitting the descriptions (where they invalid unicode was), and it might be a tad quicker too. It's unclear what version of pkg-config this flag was added in (since the changelog is no longer updated, have to check their git log instead).

@bradrn
Copy link

bradrn commented Jan 23, 2024

I’ve independently run into this same problem: https://discourse.haskell.org/t/haskell-wlroots-bindings/8426/46. Again, the culprit was libvpl. For now, I’m working around it by editing /usr/lib/pkgconfig/vpl.pc to remove the offending characters.

@tomsmeding
Copy link
Collaborator

When merged, #9609 will fix this problem. @dcoutts Promising, and clearly the right solution if well-supported (we don't care in the least about the descriptions), but Ubuntu 22.04's pkg-config doesn't seem to support it yet -- and we should probably have support for much older distros than that.

@mergify mergify bot closed this as completed in #9609 Jan 25, 2024
mergify bot added a commit that referenced this issue Jan 25, 2024
* Ignore invalid Unicode in pkg-config descriptions

Previously, if any of the pkg-config packages on the system had invalid
Unicode in their description fields (like the Intel vpl package has at
the time of writing, 2024-01-11, see #9608), cabal would crash because
it tried to interpret the entire `pkg-config --list-all` output as
Unicode.

This change, as suggested by gbaz in
  #9608 (comment)
switches to using a lazy ByteString for reading in the output, splitting
on the first space in byte land, and then parsing only the package
_name_ to a String.

For further future-proofing, package names that don't parse as valid
Unicode don't crash Cabal, but are instead ignored.

* Add changelog entry

* cabal-install-solver: Add bounds on 'text'

* No literal ASCII values, use 'ord'

* Address review comments re invalid unicode from pkg-config

* Add test for invalid unicode from pkg-config

* Compatibility with text-1.2.5.0

* Align imports

* Handle different exception type

* Use only POSIX shell syntax

* Add invalid-input handler in pkg-config shim

This is to appease shellcheck

* Actually implement all required stuff in the pkg-config shim

* Less exception dance

* Fix shebang lines

MacOS doesn't have /usr/bin/sh, and /bin/sh is the standard (for a POSIX
shell) anyway

* Don't expect a particular representation of invalid characters

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
mergify bot pushed a commit that referenced this issue Jan 25, 2024
* Ignore invalid Unicode in pkg-config descriptions

Previously, if any of the pkg-config packages on the system had invalid
Unicode in their description fields (like the Intel vpl package has at
the time of writing, 2024-01-11, see #9608), cabal would crash because
it tried to interpret the entire `pkg-config --list-all` output as
Unicode.

This change, as suggested by gbaz in
  #9608 (comment)
switches to using a lazy ByteString for reading in the output, splitting
on the first space in byte land, and then parsing only the package
_name_ to a String.

For further future-proofing, package names that don't parse as valid
Unicode don't crash Cabal, but are instead ignored.

* Add changelog entry

* cabal-install-solver: Add bounds on 'text'

* No literal ASCII values, use 'ord'

* Address review comments re invalid unicode from pkg-config

* Add test for invalid unicode from pkg-config

* Compatibility with text-1.2.5.0

* Align imports

* Handle different exception type

* Use only POSIX shell syntax

* Add invalid-input handler in pkg-config shim

This is to appease shellcheck

* Actually implement all required stuff in the pkg-config shim

* Less exception dance

* Fix shebang lines

MacOS doesn't have /usr/bin/sh, and /bin/sh is the standard (for a POSIX
shell) anyway

* Don't expect a particular representation of invalid characters

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
(cherry picked from commit 0b34b4e)

# Conflicts:
#	Cabal/src/Distribution/Simple/Program/Run.hs
Mikolaj pushed a commit that referenced this issue Jan 25, 2024
* Ignore invalid Unicode in pkg-config descriptions

Previously, if any of the pkg-config packages on the system had invalid
Unicode in their description fields (like the Intel vpl package has at
the time of writing, 2024-01-11, see #9608), cabal would crash because
it tried to interpret the entire `pkg-config --list-all` output as
Unicode.

This change, as suggested by gbaz in
  #9608 (comment)
switches to using a lazy ByteString for reading in the output, splitting
on the first space in byte land, and then parsing only the package
_name_ to a String.

For further future-proofing, package names that don't parse as valid
Unicode don't crash Cabal, but are instead ignored.

* Add changelog entry

* cabal-install-solver: Add bounds on 'text'

* No literal ASCII values, use 'ord'

* Address review comments re invalid unicode from pkg-config

* Add test for invalid unicode from pkg-config

* Compatibility with text-1.2.5.0

* Align imports

* Handle different exception type

* Use only POSIX shell syntax

* Add invalid-input handler in pkg-config shim

This is to appease shellcheck

* Actually implement all required stuff in the pkg-config shim

* Less exception dance

* Fix shebang lines

MacOS doesn't have /usr/bin/sh, and /bin/sh is the standard (for a POSIX
shell) anyway

* Don't expect a particular representation of invalid characters

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
(cherry picked from commit 0b34b4e)
erikd pushed a commit to erikd/cabal that referenced this issue Apr 22, 2024
* Ignore invalid Unicode in pkg-config descriptions

Previously, if any of the pkg-config packages on the system had invalid
Unicode in their description fields (like the Intel vpl package has at
the time of writing, 2024-01-11, see haskell#9608), cabal would crash because
it tried to interpret the entire `pkg-config --list-all` output as
Unicode.

This change, as suggested by gbaz in
  haskell#9608 (comment)
switches to using a lazy ByteString for reading in the output, splitting
on the first space in byte land, and then parsing only the package
_name_ to a String.

For further future-proofing, package names that don't parse as valid
Unicode don't crash Cabal, but are instead ignored.

* Add changelog entry

* cabal-install-solver: Add bounds on 'text'

* No literal ASCII values, use 'ord'

* Address review comments re invalid unicode from pkg-config

* Add test for invalid unicode from pkg-config

* Compatibility with text-1.2.5.0

* Align imports

* Handle different exception type

* Use only POSIX shell syntax

* Add invalid-input handler in pkg-config shim

This is to appease shellcheck

* Actually implement all required stuff in the pkg-config shim

* Less exception dance

* Fix shebang lines

MacOS doesn't have /usr/bin/sh, and /bin/sh is the standard (for a POSIX
shell) anyway

* Don't expect a particular representation of invalid characters

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants