Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#282 Add strict po loader #283

Merged

Conversation

jonasraoni
Copy link
Contributor

@jonasraoni jonasraoni commented Jul 10, 2022

Hi @oscarotero!

I did some initial updates to cover my needs.

  • I saw it was missing support for the "previous translation comment", so I've added the needed parts to the translator, generator and the StrictPoLoader.
  • About the strictness, I've tried to follow the rules of msgfmt -c (e.g. line break/whitespace is optional almost everywhere) 🫠
  • I was going to enhance the PoLoader, but it became too polluted with my checks and I decided to write a new parser...
  • I barely changed code to handle the comments and the headers.
  • I've tried to avoid using regexps, as the performance drops severely with large texts

@jonasraoni
Copy link
Contributor Author

jonasraoni commented Jul 10, 2022

I've tested some weird combinations, but forgot to move them to the unit test 🤦
Well, I'll wait for some comments (e.g. if it's ok to keep a new class or if you prefer to replace the existing one, etc), then I might add some tests.

@oscarotero
Copy link
Member

Hi.
Thanks for this work!!
I'm not familiarized with the strict mode. What's the difference between strict mode and regular po? This will help me to understand the need to create a new different Po loader.

@jonasraoni
Copy link
Contributor Author

jonasraoni commented Jul 12, 2022

Hi!

There's no "strict" syntax, I just mimicked the behavior of the GNU tools, which I think is a good guide (given the documentation is missing details) :)

For example, the msgfmt works fine with this syntax (the new lines are required just to break comments):

msgctxt"context"msgid"message"msgstr"translation"

But it fails if you try to parse things in the wrong order, with an invalid msgstr index (out of order/out of bounds), bad escaping, etc.

So in theory this could replace the current PoLoader class, but it can introduce surprises (exceptions) for your users if their files have a bad syntax (I've tested this code against 1274 files of my project and all the exceptions I had were genuine).

So I'll wait for your feedback. In order to decrease the pain, I could just add a flag, to swallow the exceptions by default, and continue to the next translation... But the input vs output of a defective file won't be the same as the current class.

@jonasraoni jonasraoni force-pushed the feature/master/282-add-strict-po-loader branch from 078246a to 7981363 Compare July 23, 2022 19:24
… file, added method getWarnings() and introduced new function (loadStringExtended) to avoid breaking the Gettext\Loader\Loader contract
@jonasraoni jonasraoni force-pushed the feature/master/282-add-strict-po-loader branch from b3be188 to 1830ea4 Compare July 24, 2022 10:51
@jonasraoni
Copy link
Contributor Author

jonasraoni commented Jul 26, 2022

Ok @oscarotero, I'm done with the performance fixes (more than that will require me to uglify the code even more or drop features).
In a small dataset with 22 files (564KB) the time went from 4s to .47s, while the PoLoader is taking .25s.

I thought about doing a memory optimization (parse the file by chunks), but that will damage the speed (and my use case doesn't need it: locale files with a max of 170KB) 🤔

If you think it's helpful, I could rebase the commits to mark the tests/BasePoLoaderTestCase.php as a file rename from tests/PoLoaderTest.php, just to improve the code history.

@oscarotero
Copy link
Member

Thanks @jonasraoni

If you think it's helpful, I could rebase the commits to mark the tests/BasePoLoaderTestCase.php as a file rename from tests/PoLoaderTest.php, just to improve the code history.

Yeah, it makes sense. But I think it's not so important, so whatever you want :)

One think that I would like is update the README.md file to include this new loader and explain the differences between both PO loaders (including also the difference in performance). As this info is more in your head, could you please add this latest change before merging?

Thanks so much!

…orLine (switch if the error message should show the byte/line where the error happened) to the StrictPoLoader
@jonasraoni
Copy link
Contributor Author

@oscarotero I've updated the README and the CHANGELOG 👌

@oscarotero oscarotero merged commit 5ee2a4c into php-gettext:master Jul 27, 2022
@oscarotero
Copy link
Member

Thank you! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants