Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog readtime includes inline SVG text content #7367

Closed
4 tasks done
sisp opened this issue Jul 16, 2024 · 4 comments · Fixed by #7370
Closed
4 tasks done

Blog readtime includes inline SVG text content #7367

sisp opened this issue Jul 16, 2024 · 4 comments · Fixed by #7370
Labels
change request Issue requests a new feature or improvement resolved Issue is resolved, yet unreleased if open

Comments

@sisp
Copy link
Contributor

sisp commented Jul 16, 2024

Context

I'm inlining SVG images to be able to use Material for MkDocs' CSS variables for more consistent image styles with the rest of the page.


Slightly off topic but just for completeness: Unlike in the below reproduction (as I want to keep it minimal), I actually wrap the inline SVG in

<div>
  <template shadowrootmode="open">
    <svg ...>
       ...
    </svg>
  </template>
</div>

to avoid leakage of the CSS definitions into the document and potentially conflict with other inline SVGs' CSS definitions.

Bug description

The blog readtime plugin extracts any text data from the generated HTML page content including, e.g., CSS definitions in inline SVG images. This leads to a bad estimate of the read time, especially when, e.g., there are many CSS definitions (i.e. much "text").

A fix might involve skipping some tags such as <svg>, <style>, and <script> when gathering text data. I'd be happy to contribute a fix if you agree with the bug report and when we've converged on a solution proposal.

Related links

Reproduction

9.5.29-blog-readtime-inline-svg.zip

I've edited venv/lib/python3.12/site-packages/material/plugins/blog/readtime/__init__.py to demonstrate the current behavior of the HTML parser:

     # Extract words from text and compute readtime in seconds
+    print("DEBUG: ", parser.text)
     words = len(re.split(r"\W+", "".join(parser.text)))
     seconds = ceil(words / words_per_minute * 60)

Steps to reproduce

  1. Unzip the reproduction.

  2. Run mkdocs serve.

  3. Observe the line

    DEBUG:  ['Readtime includes inline SVG text', '\n', '\n  ', '\n    ', '\n      ', '\n        <![CDATA[\n        .fill-red {\n          fill: red;\n        }\n        ]]>\n      ', '\n    ', '\n    ', '\n  ', '\n  ', 'Red SVG rectangle', '\n']
    

    in the terminal which shows the list of text data extracted by the HTML parser of the readtime plugin.

Browser

No response

Before submitting

@squidfunk
Copy link
Owner

Thanks for reporting.

A fix might involve skipping some tags such as , <style>, and <script> when gathering text data. I'd be happy to contribute a fix if you agree with the bug report and when we've converged on a solution proposal.

Yes, happy to accept a PR here. We need to build some logic to skip adding of content when inside specific tags:

class ReadtimeParser(HTMLParser):
# Initialize parser
def __init__(self):
super().__init__(convert_charrefs = True)
# Keep track of text and images
self.text = []
self.images = 0
# Collect images
def handle_starttag(self, tag, attrs):
if tag == "img":
self.images += 1
# Collect text
def handle_data(self, data):
self.text.append(data)

The search plugin already does that here:

# Tags to skip
self.skip = set([
"object", # Objects
"script", # Scripts
"style" # Styles
])

# Ignore self-closing tags
el = Element(tag, attrs)
if not tag in void:
self.context.append(el)
else:
return

# Called for the text contents of each tag
def handle_data(self, data):
if self.skip.intersection(self.context):
return

@squidfunk squidfunk added the change request Issue requests a new feature or improvement label Jul 16, 2024
@sisp
Copy link
Contributor Author

sisp commented Jul 16, 2024

Thanks for the very helpful pointers to the search plugin! I've submitted a PR.

@squidfunk
Copy link
Owner

Keeping open until released.

@squidfunk squidfunk reopened this Jul 16, 2024
@squidfunk squidfunk added the resolved Issue is resolved, yet unreleased if open label Jul 16, 2024
@squidfunk
Copy link
Owner

Released as part of 9.5.30.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change request Issue requests a new feature or improvement resolved Issue is resolved, yet unreleased if open
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants