Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs/glob doesn't match all filenames with valid UTF-8 names (emoji) on macOS #141

Open
jakedn opened this issue Apr 11, 2025 · 8 comments
Open

Comments

@jakedn
Copy link

jakedn commented Apr 11, 2025

bb --version
babashka v1.12.197

MacOS 15.4 (apple silicon)

I use bb to help me manipulate my notes. I use emoji in filenames and file contents, after moving to MacOS from linux I realized the glob wasn't getting all files on my mac.

You can reproduce with the following (on MacOS)

  1. open terminal
  2. mkdir -p ~/Downloads/bb-glob-issue
  3. touch ~/Downloads/bb-glob-issue/'📷 photography.md'
  4. touch ~/Downloads/bb-glob-issue/'🗞️ Article.md'
  5. touch ~/Downloads/bb-glob-issue/'🗣️ talk.md'
  6. touch ~/Downloads/bb-glob-issue/'🤔 interesting things.md'
  7. start repl and import babashka.fs
  8. (fs/glob (fs/expand-home "~/Downaloads/bb-glob-issue") "*.md")

This only returns '🤔 interesting things.md' and '📷 photography.md'

Is this a graalvm problem ?
It is confusing to me how it works fine on linux but not on MacOS; the shell on mac is able to glob as I would expect it is only going thought babashka.fs/glob that gives me this issue.

@borkdude
Copy link
Contributor

It's a weird issue but on my Macbook Pro i9 Intel I see these results consistently between the JVM and babashka:

$ bb -e '(fs/glob (fs/expand-home "~/Downloads/bb-glob-issue") "*.md")'
[#object[sun.nio.fs.UnixPath 0x9b0dedf "/Users/borkdude/Downloads/bb-glob-issue/🤔 interesting things.md"] #object[sun.nio.fs.UnixPath 0x2387b5cf "/Users/borkdude/Downloads/bb-glob-issue/📷 photography.md"]]

$ clj -M -e "(require '[babashka.fs :as fs])" -e '(fs/glob (fs/expand-home "~/Downloads/bb-glob-issue") "*.md")'
[#object[sun.nio.fs.UnixPath 0x4c4f4365 "/Users/borkdude/Downloads/bb-glob-issue/🤔 interesting things.md"] #object[sun.nio.fs.UnixPath 0xacf859d "/Users/borkdude/Downloads/bb-glob-issue/📷 photography.md"]]

The JVM I'm using is Oracle GraalVM 24+36.1. So it could be a Oracle GraalVM issue. Let me try another Java version...

@borkdude
Copy link
Contributor

With Java 17 Temurin I'm getting the same result.

@borkdude
Copy link
Contributor

I managed to reproduce it in pure Java interop:

user=> (.matches (.getPathMatcher (java.nio.file.FileSystems/getDefault) "glob:*.*") (.toPath (clojure.java.io/file "🗞️ Article.md")))
false
user=> (.matches (.getPathMatcher (java.nio.file.FileSystems/getDefault) "glob:*.*") (.toPath (clojure.java.io/file "Article.md")))
true

Do you see the same?

@borkdude
Copy link
Contributor

I posted a question about this on Stackoverflow: https://stackoverflow.com/questions/79568521/glob-not-working-on-paths-with-unicode-in-java

@borkdude
Copy link
Contributor

I noticed the **.* pattern does work but this also matches subdirectories.

@jakedn
Copy link
Author

jakedn commented Apr 11, 2025

Do you see the same?

I don't have java installed on my machine and won't be on a Mac over the weekend.
I'll be sure to check it next week and report back.

Based on your tests it does seem to be more of a java/jvm issue on MacOS. I only noticed this because I moved from linux to Mac and all the sudden bb was skipping files when manipulating them.
For now I have a workaround with list-dir and filtering names in the script.

To be honest, I wasn't expected such a fast response; I have a lot of appreciation for your work, it makes me feel I chose the right tool in babashka!

@borkdude
Copy link
Contributor

borkdude commented Apr 11, 2025

Thanks!

The problem only seems to be there on macOS for me as well. I tried linux and Windows but those return true for both expressions.

@borkdude
Copy link
Contributor

I posted a bug report to https://bugreport.java.com/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants