Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 155: log metadata prior to verification #160

Closed
wants to merge 7 commits into from

Conversation

richterdavid
Copy link

@richterdavid richterdavid commented May 2, 2024

Fix #155

There's a few things not working right yet, and this isn't passing pre-commits yet, so I'm pushing this just for advice at this time.

Copy link
Collaborator

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to give you as much help as possible. Hope it will be enough.

src/zimscraperlib/zim/creator.py Outdated Show resolved Hide resolved
tests/zim/test_zim_creator.py Outdated Show resolved Hide resolved
tests/zim/test_zim_creator.py Outdated Show resolved Hide resolved
@richterdavid richterdavid marked this pull request as ready for review May 16, 2024 22:30
@richterdavid
Copy link
Author

Now passing tests and precommits.

@benoit74 benoit74 self-requested a review May 21, 2024 06:14
Copy link
Collaborator

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, good start, of course few things left to fix.

Please also add an entry to the CHANGELOG.md

src/zimscraperlib/zim/creator.py Outdated Show resolved Hide resolved
src/zimscraperlib/zim/creator.py Outdated Show resolved Hide resolved
src/zimscraperlib/zim/creator.py Outdated Show resolved Hide resolved
src/zimscraperlib/zim/creator.py Outdated Show resolved Hide resolved
tests/zim/test_zim_creator.py Outdated Show resolved Hide resolved
tests/zim/test_zim_creator.py Show resolved Hide resolved
src/zimscraperlib/zim/creator.py Outdated Show resolved Hide resolved
tests/zim/test_zim_creator.py Show resolved Hide resolved
@benoit74 benoit74 self-requested a review May 30, 2024 10:09
Copy link
Collaborator

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you! @rgaudin could you please do a final review should I have missed something?

@benoit74 benoit74 requested a review from rgaudin May 30, 2024 10:10
Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you ; see my few suggestions.

Here's what I have in zimls. Goal is different than yours but you get the idea.

    print("Metadata:")
    for name in zim.metadata_keys:
        item = zim.get_metadata_item(name)
        if item.mimetype.startswith("text/plain"):
            try:
                preview = bytes(item.content).decode("UTF-8")
            except UnicodeDecodeError as exc:
                preview = bytes(item.content)
        elif as_base64:
            preview = (
                f"{item.mimetype} ({item.size} bytes) binary: {to_base64(item.content)}"
            )
        else:
            preview = f"{item.mimetype} binary ({item.size} bytes)"
        print(f" - {name}: {preview}")

src/zimscraperlib/zim/creator.py Outdated Show resolved Hide resolved
src/zimscraperlib/zim/creator.py Outdated Show resolved Hide resolved
src/zimscraperlib/zim/creator.py Outdated Show resolved Hide resolved
src/zimscraperlib/zim/creator.py Outdated Show resolved Hide resolved
src/zimscraperlib/zim/creator.py Outdated Show resolved Hide resolved
src/zimscraperlib/zim/creator.py Outdated Show resolved Hide resolved
src/zimscraperlib/zim/creator.py Outdated Show resolved Hide resolved
src/zimscraperlib/zim/creator.py Outdated Show resolved Hide resolved
@rgaudin
Copy link
Member

rgaudin commented May 31, 2024

@richterdavid I see you've resolved all conversations. Is it ready for re-review?

Also, our convention is that the PR submitter only resolves a conversation should it have been fixed in code directly.
If the conversation was a question or if it leads to an explanation or a different way to address, then it's to the reviewer (who started the conversion) to resolve.

@richterdavid
Copy link
Author

Yes, please do re-review.

I think my resolves complied with that convention. Feel free to reopen any you see fit.

@rgaudin rgaudin self-requested a review June 6, 2024 08:18
Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, see my inline comments

@@ -146,7 +150,35 @@ def config_indexing(
self.__indexing_configured = True
return self

def _is_illustration_metadata_name(self, name: str) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are using it once and it's a very simple check ; why the extra function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving it a name is a way of making the code a little more self-documenting. Really the prefix should be put somewhere shared and reused as well, but I didn't want to edit all the other places the constant is used in this change.

def _log_metadata(self):
for name, value in sorted(self._metadata.items()):
if self._is_illustration_metadata_name(name) and isinstance(value, bytes):
illus_md = self._get_illustration_metadata_details(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not logging directly? Then you have all the flexibility you want

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging illus_md inside of get_illustration_metadata_details? I could, but it would spread out knowledge of the metadata logging format. It's preferable to keep that format info in one function.

"""Return True if name is a valid illustration metadata name"""
return name.startswith("Illustration_")

def _get_illustration_metadata_details(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function is not very appealing. You want to know the image format and its dimensions. Why not returning this instead of building a string in some util function?
As per error handling, you have the choice to handle it when calling or inside the function but anyway, better safe than sorry and handle every exception. Currently if PIL cannot read (and it's not UnidentifiedImageError or if size could not be set to a tuple, it will crash

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, now it won't crash quite as easily. Agreed that this could return a richer data-structure. Doing so would complicate the caller. Avoiding the complication in the caller by pushing the logging into this function smears the knowledge of the logging format in use around. It also prevents converting this whole thing from N logging calls to 1 call contains N items; the latter may be preferable if functions that do other logging end up being called, causing the metadata logging to be spread out rather than in a block.

for name, value in sorted(self._metadata.items()):
if self._is_illustration_metadata_name(name) and isinstance(value, bytes):
illus_md = self._get_illustration_metadata_details(value)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will log everything that's not bytes. What if it cannot be represented as string?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would indeed be unfortunate. The preferred ways to set up metadata, though, go through typed handlers that ensure it will be bytes. Worst case, IIUC, is that it gets logged as is. That might be annoying if it's a 47K character string or somesuch, but it's not terrible.

src/zimscraperlib/zim/creator.py Outdated Show resolved Hide resolved
@benoit74
Copy link
Collaborator

@richterdavid we would like to release 3.4.0 in the coming days, we would appreciate if you could achieve fix the few remaining conversations since 3.4.0 ; otherwise this PR will be postponed to 3.5, which is not a concern.

@richterdavid
Copy link
Author

Please take another look.

@benoit74 benoit74 requested a review from rgaudin June 11, 2024 06:26
@rgaudin rgaudin requested a review from benoit74 June 11, 2024 07:49
@benoit74
Copy link
Collaborator

@rgaudin could you maybe propose a commit with what seems reasonable to fix your points. It looks like we are stuck in a silly conversation on "best design" which I've never seen leading to anywhere but lengthy debates. Since we are the maintainers of this lib on the long term (we will have to "pay" for improper design) and since you have more experience on this project, I think that you can deserve to have the "last word".

@rgaudin
Copy link
Member

rgaudin commented Jun 18, 2024

Will do

@rgaudin
Copy link
Member

rgaudin commented Jun 20, 2024

See #172 as I can't push here

benoit74 added a commit that referenced this pull request Jun 21, 2024
 Issue 155: log metadata prior to verification (PR #160 fixes)
@benoit74
Copy link
Collaborator

Superseeded by #172

@benoit74 benoit74 closed this Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Display all metadata in debug log level
3 participants