Skip to content

Conversation

sdx23
Copy link
Contributor

@sdx23 sdx23 commented Jan 8, 2022

Metadata is always nice to have, but having it, we should also use it. Specifically, set the file mtime to the document date. Actually, I even think that should be the default option. The download time is still available in other attributes, but I don't really see where it'd be important.

This might just as well fit into rename_after_download which should then be postprocess_after_download, but I had no strong preference so it landed in cli.py.

Up to now I didn't check whether the date attribute acutally exists. I guess it should for any document.

Regarding click options: I find it strange to supply "=true" to bool options. Imho such options should be simple switches to enable the non-default behaviour (e.g. also in --headless) -- which is common for a lot of other programs. However, for now I've kept with the existing style.

@heeplr
Copy link
Owner

heeplr commented Jan 8, 2022

Metadata is always nice to have, but having it, we should also use it. Specifically, set the file mtime to the document date. Actually, I even think that should be the default option. The download time is still available in other attributes, but I don't really see where it'd be important.

I think the download time is crucial. By manipulating it, you actually destroy metadata. There are two main reasons for this:

  1. document-dl mimics manual download and the user expects the mtime as set by the browser.
  2. If you think of the document as an actual paper document, you are replacing the postal date on the envelope with the date on the actual document. The former represents the time it was sent and the latter the time it was written. (Those are used to prevent dating a document back and claiming it was sent long ago). Similarly, service providers store the download time as "seen" timestamp in their database to mark the time, when the user has received the document. The same is true for the file's mtime attribute. By setting it to the time, the document was created you loose the time it was downloaded.

I strongly suggest to use the json metadata that is output by document-dl during/after download and do any postprocessing after document-dl (a simple shellscript that parses piped json using jq and sets mtime using touch should do the trick)

Up to now I didn't check whether the date attribute acutally exists. I guess it should for any document.

That's another problem, actually the same with global temporal filter arguments - not every plugin provides dates and document-dl behaviour would be inconsistent.

Regarding click options: I find it strange to supply "=true" to bool options. Imho such options should be simple switches to enable the non-default behaviour (e.g. also in --headless) -- which is common for a lot of other programs. However, for now I've kept with the existing style.

I agree and I will think of a good solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants