Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#6449 Extended file name support to include characters from multiple languages, including Cyrillic and Han scripts #8925

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

christianrowlands
Copy link

Type of change

  • Feature
  • Bugfix
  • Technical
  • Other :

Content

I fixed an issue when sharing files that had non-Latin characters would have the file name replaced with underscores. For example, here is a screenshot that shows up when "forwarding" a file that was send in Element Android.

image

Motivation and context

Here is a link to the issue: #6449

Worthy of note is that I thought about a couple different approaches to fixing this problem. My first regex approach was to use the existing "inclusion" approach, and add Cyrillic and Han scripts. However, after realizing that it could get messy to add support for all the different scripts, I switched to an "exclusion" approach where I remove any known invalid characters.

For reference, here was the first approach

.replace("[^\\p{sc=Cyrillic}\\p{sc=Han}a-z A-Z0-9\\\\.\\-]".toRegex(), "_")

And version 2

.replace("[\\\\?%*:|\"<>\\s]".toRegex(), "_")

Tests

  1. I sent a file containing Cyrillic characters in Element Web.
  2. I viewed that message in Element Android
  3. I clicked the share button for that file.
  4. I verified that the file name in the share UI was not all underscores.

I also wrote unit tests to verify the new regex works as expected (see the code diff).

Tested devices

  • Physical
  • Emulator
  • OS version(s): Android 15 and Android 5.1

Checklist

Signed-off-by: Christian Rowlands <craxiomdev [at] gmail.com>

@christianrowlands
Copy link
Author

@bmarty , I have another simple PR for you if you can take a look at it. If you have any objections to the new RegEx, I am happy to update it as necessary, or add more tests to verify different scenarios.

@christianrowlands christianrowlands changed the title #6449 #6449 Extended file name support to include characters from multiple languages, including Cyrillic and Han scripts Oct 16, 2024
@element-hq element-hq deleted a comment from logman12oge Oct 25, 2024
@CLAassistant
Copy link

CLAassistant commented Nov 1, 2024

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants