Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance document #1197

Merged
merged 9 commits into from
Nov 25, 2022
Merged

Enhance document #1197

merged 9 commits into from
Nov 25, 2022

Conversation

NewUserHa
Copy link
Contributor

@NewUserHa NewUserHa commented Nov 19, 2022

This PR tries to update the document and add some details to it.

Found some issues that need to discuss:

  • customCleanUpRe no document.
  • customBadChars no code no use.
  • writeRawJSON no code no use.
  • writeSeriesJSON no code no use. the includeSeriesJSON do the job.
  • useLocalTimezone can be removed since all file time is always use local time.

hope this can make the document better for coming new users

@biggestsonicfan
Copy link
Contributor

What on earth?

@NewUserHa
Copy link
Contributor Author

the document, the readme.md.

those settings are not used at all or having issues.

@Nandaka
Copy link
Owner

Nandaka commented Nov 19, 2022

Is the document you are referring to is the Readme.md?

writeRawJSON it is used in the code and in the config file.

image
image

includeSeriesJSON is the correct one, not writeSeriesJSON ==> only need to update the readme

useLocalTimezone => not in my case see #420 as the API return in utc

customCleanUpRe is used as part of filename sanitizer if user want to use it.
image

customBadChars ==> customBadChars was added on #920, it is still used in the code (refer to the #920)

@NewUserHa
Copy link
Contributor Author

NewUserHa commented Nov 19, 2022

right.
I'm trying set config based on the document, so I found the issues.

writeRawJSON, writeImageJSON

turned both on, but no '..-raw.text' file written, only a json.text from 'writeImageJSON'

writeSeriesJSON

it's created by .py when no config found

useLocalTimezone

system use time in seconds since the epoch (i.e. UTC) always. tzinfo is usually useful only in string format and time calculation.
the reason that '70850475.txt' from that issue with wrong time may be no 'os.utime()' followed after '.close()' at:

info.close()

so useLocalTimezone may be no need.

customCleanUpRe, customBadChars

I think maybe adding a few examples will help.

I want details of the 'checkfilesize' 'overwrite' and 'etc.' too since they are important to control downloading

@NewUserHa
Copy link
Contributor Author

NewUserHa commented Nov 20, 2022

some more useful info that I think can help users and enhance document:

createWebp, webm, apng, gif

with zipfile.ZipFile(ugoira_file) as f:

this line extracted all files from the ugoira. it can directly convert from the source file .zip to all other formats actually, I guess.

webpparam: Parameter to be used to encode webm. default is lossless 0 -q:v 90 -loop 0 -vsync 2 -r 999

"to encode webm" ==> "to encode webp".
while actual argument miss lossless 1(true) at

_config.ffmpegParam = "-vsync 2 -r 999 -pix_fmt yuv420p"

webpparam

webp can use lossless for the best quality by default.

checkLastModified: when found db record and both overwrite and checksize is false, compare the local file's last-modified time with the uploaded time on pixiv.

if config.checkLastModified and os.path.isfile(filename_save) and image is not None:

it should do the time check after finding the file size is the same.

alwaysCheckFileSize: when found db record, if both this and overwrite are false, it will only check if the file exists at the recorded path. otherwise, it'll always do a file size check.

overwrite: need alwaysCheckFileSize to work. when overwrite is true, if the file size is different, it will just delete it and then redownload it.

readme.md Outdated
@@ -528,7 +528,7 @@ Please refer run with `--help` for latest information.
Set to `True` to export the image information to a .XMP sidecar file, one per image in the album. The data contained within the file is the same but some software requires matching file names to detect the metadata. If set to `True`, then `writeImageXMP` is ignored.
- verifyimage

Do image and zip checking after download. Set the value to `True` to enable.
Check if downloaded files is valid image or zip. Set the value to `True` to enable.
Copy link
Contributor

@biggestsonicfan biggestsonicfan Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If "files" is plural, the proper term to use would be "are", so Check if downloaded files are a valid image or zip, however this is not what verifyimage actually does. verifyimage simply checks if the file ends in ".jpg", ".png", ".gif", ".ugoira", or ".zip". It does not verify if it is a valid image or zip file.

Copy link
Contributor Author

@NewUserHa NewUserHa Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mistake.

that adding details is trying to make it easier to tweak PixivUtil2 for experienced users.

actually, verifyimage will validate files:

elif config.verifyImage and filename_save.endswith((".jpg", ".png", ".gif")):
fp = None
try:
from PIL import Image, ImageFile
fp = open(filename_save, "rb")
# Fix Issue #269, refer to https://stackoverflow.com/a/42682508
ImageFile.LOAD_TRUNCATED_IMAGES = True
img = Image.open(fp)
img.load()
fp.close()
PixivHelper.print_and_log('info', ' Image verified.')
except BaseException:
if fp is not None:
fp.close()
PixivHelper.print_and_log('info', ' Image invalid, deleting...')
os.remove(filename_save)
raise
elif config.verifyImage and filename_save.endswith((".ugoira", ".zip")):
fp = None
try:
import zipfile
fp = open(filename_save, "rb")
zf = zipfile.ZipFile(fp)
check_result = None
try:
check_result = zf.testzip()
# Issue #1165
except NotImplementedError as ne:
PixivHelper.print_and_log('warn', f' {ne}')
except RuntimeError as e:
if 'encrypted' in str(e):
PixivHelper.print_and_log('info', ' archive is encrypted, cannot verify.')
else:
raise
fp.close()
if check_result is None:
PixivHelper.print_and_log('info', ' Image verified.')
else:
PixivHelper.print_and_log('info', f' Corrupted file in archive: {check_result}.')
raise PixivException(f"Incomplete Downloaded for {url}", PixivException.DOWNLOAD_FAILED_OTHER)
except BaseException:
if fp is not None:
fp.close()
PixivHelper.print_and_log('info', ' Image invalid, deleting...')
os.remove(filename_save)
raise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh so it does, my mistake

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that the overwrite and alwaysCheckFileSize have too much tech debt.
and not sure how to detail those

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, it is from 2015 and no major rewrite at all. As long it is working as my expectation, I don't plan to do anything as long it doesn't break.

@Nandaka
Copy link
Owner

Nandaka commented Nov 22, 2022

also send a lot of changes in one big pull request, i might not be able to understand it 😄

@NewUserHa
Copy link
Contributor Author

This PR will try to improve the document only, to make both new and experienced users clear with some details.

@NewUserHa NewUserHa requested review from Nandaka and removed request for Nandaka November 23, 2022 11:26
@NewUserHa
Copy link
Contributor Author

The rest of this PR is done. Except the customCleanUpRe, but I don't use those settings and didn't read those code.

@Nandaka Can you add its document?

@biggestsonicfan
Copy link
Contributor

Did you notice r18Type isn't documented? The information on it can be found in #439.

@NewUserHa
Copy link
Contributor Author

@cglmrfreeman it's updated now.
There're many options I don't use. You can also suggest any option I didn't see to the PR directly.

@@ -223,7 +223,7 @@ def proxy(self):
value = getattr(self, "proxyAddress", None)
if not value:
return None
match = re.match(r"^(?:(https?|socks[45])://)?([\w.-]+)(:\d+)?$", value)
match = re.match(r"^(?:(https?|socks[45]h?)://)?([\w.-]+)(:\d+)?$", value)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add check for socks4a too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can. But I don't have any socks4a server for test.

@Nandaka Nandaka merged commit 3771035 into Nandaka:master Nov 25, 2022
@NewUserHa NewUserHa mentioned this pull request Nov 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants