Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display all metadata in debug log level #155

Closed
benoit74 opened this issue Apr 19, 2024 · 3 comments · Fixed by #172
Closed

Display all metadata in debug log level #155

benoit74 opened this issue Apr 19, 2024 · 3 comments · Fixed by #172
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@benoit74
Copy link
Collaborator

As discussed in openzim/warc2zim#123, we would benefit from logging the metadata which are used, at least all text values.

Regarding illustration, do we want to log the base64 value? It might be useful for debug as well, but not always negligible in log size.

I recommend to do it right at the beginning of the start method, before check of presence of mandatory metadatas and before potential validation, so that it is always logged.

@rgaudin @kelson42 WDYT?

@richterdavid do you confirm you wanna implement this issue? Please wait a little bit for arguments to settle here before rushing into any implementation, we need to confirm everyone is aligned on the same page

@richterdavid
Copy link

@richterdavid do you confirm you wanna implement this issue?

Happy to.

@richterdavid
Copy link

@benoit74 by start() you meant Creator.start here?

How about setting a command-line flag for how much of the illustration to log? Default it to something (e.g., 100 bytes), and support two sentinel values representing "nothing" and "everything".

@benoit74
Copy link
Collaborator Author

by start() you meant Creator.start here?

Yes

How about setting a command-line flag for how much of the illustration to log? Default it to something (e.g., 100 bytes), and support two sentinel values representing "nothing" and "everything".

Python-scraperlib is a library, so there is no such things as command-line flag. But we could add an argument e.g. to Creator init() method.

But the idea of logging only the first 100 bytes makes little sense to me, it has little value. It might only be used to check mime type, but then I would rather prefer that we log only the illustration mime type (python-scraperlib already has everything needed to detect it). Logging nothing is then not a big win and I don't expect scrapers to be willing to use this alternative (you do not mind about one extra small log line usually). And logging everything, if optional, is then better done directly in the scraper rather than in python-scraperlib.

So to sum-up: I propose to log all raw metadata except for the illustration where we log only its mime type. And no new argument to any function.

WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants