Adds feature to strip HTML from captions #1045
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Because Pixiv allows some formatting in image descriptions, the HTML tags remain and it can mess up the formatting of some descriptions in image software e.g:
With the stripHTMLTagsFromCaption enabled, the HTML tags are removed using BeautifulSoup:
This does mean that some data might be lost .e.g the actual URL because it's inside the tag, for URLs though I made sure this is collected by
writeUrlInDescription
before they are removed bystripHTMLTagsFromCaption
so they are available in the rest of the metadata.Also I've removed Exiv2 from dependent software as it's not needed with the new Pyexiv2 library implemented earlier, but Visual Studio C++ Redistributable is required and is not installed on Windows by default.