#6449 Extended file name support to include characters from multiple languages, including Cyrillic and Han scripts #8925
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Type of change
Content
I fixed an issue when sharing files that had non-Latin characters would have the file name replaced with underscores. For example, here is a screenshot that shows up when "forwarding" a file that was send in Element Android.
Motivation and context
Here is a link to the issue: #6449
Worthy of note is that I thought about a couple different approaches to fixing this problem. My first regex approach was to use the existing "inclusion" approach, and add Cyrillic and Han scripts. However, after realizing that it could get messy to add support for all the different scripts, I switched to an "exclusion" approach where I remove any known invalid characters.
For reference, here was the first approach
.replace("[^\\p{sc=Cyrillic}\\p{sc=Han}a-z A-Z0-9\\\\.\\-]".toRegex(), "_")
And version 2
.replace("[\\\\?%*:|\"<>\\s]".toRegex(), "_")
Tests
I also wrote unit tests to verify the new regex works as expected (see the code diff).
Tested devices
Checklist
Signed-off-by: Christian Rowlands <craxiomdev [at] gmail.com>