Blog readtime includes inline SVG text content #7367

sisp · 2024-07-16T08:54:41Z

Context

I'm inlining SVG images to be able to use Material for MkDocs' CSS variables for more consistent image styles with the rest of the page.

Slightly off topic but just for completeness: Unlike in the below reproduction (as I want to keep it minimal), I actually wrap the inline SVG in

<div>
  <template shadowrootmode="open">
    <svg ...>
       ...
    </svg>
  </template>
</div>

to avoid leakage of the CSS definitions into the document and potentially conflict with other inline SVGs' CSS definitions.

Bug description

The blog readtime plugin extracts any text data from the generated HTML page content including, e.g., CSS definitions in inline SVG images. This leads to a bad estimate of the read time, especially when, e.g., there are many CSS definitions (i.e. much "text").

A fix might involve skipping some tags such as <svg>, <style>, and <script> when gathering text data. I'd be happy to contribute a fix if you agree with the bug report and when we've converged on a solution proposal.

Reproduction

9.5.29-blog-readtime-inline-svg.zip

I've edited venv/lib/python3.12/site-packages/material/plugins/blog/readtime/__init__.py to demonstrate the current behavior of the HTML parser:

     # Extract words from text and compute readtime in seconds
+    print("DEBUG: ", parser.text)
     words = len(re.split(r"\W+", "".join(parser.text)))
     seconds = ceil(words / words_per_minute * 60)

Steps to reproduce

Unzip the reproduction.
Run mkdocs serve.

Observe the line

DEBUG:  ['Readtime includes inline SVG text', '\n', '\n  ', '\n    ', '\n      ', '\n        <![CDATA[\n        .fill-red {\n          fill: red;\n        }\n        ]]>\n      ', '\n    ', '\n    ', '\n  ', '\n  ', 'Red SVG rectangle', '\n']

in the terminal which shows the list of text data extracted by the HTML parser of the readtime plugin.

Browser

No response

Before submitting

I have read and followed the bug reporting guidelines.
I have attached links to the documentation, and possibly related issues and discussions.
I assure that I have removed all customizations before submitting this bug report.
I have attached a .zip file with a minimal reproduction using the built-in info plugin.

The text was updated successfully, but these errors were encountered:

squidfunk · 2024-07-16T11:58:25Z

Thanks for reporting.

A fix might involve skipping some tags such as , <style>, and <script> when gathering text data. I'd be happy to contribute a fix if you agree with the bug report and when we've converged on a solution proposal.

Yes, happy to accept a PR here. We need to build some logic to skip adding of content when inside specific tags:

mkdocs-material/src/plugins/blog/readtime/parser.py

Lines 28 to 45 in 4f8081c

    
           class ReadtimeParser(HTMLParser): 
        
               # Initialize parser 
        
               def __init__(self): 
        
                   super().__init__(convert_charrefs = True) 
        
                   # Keep track of text and images 
        
                   self.text   = [] 
        
                   self.images = 0 
        
               # Collect images 
        
               def handle_starttag(self, tag, attrs): 
        
                   if tag == "img": 
        
                       self.images += 1 
        
               # Collect text 
        
               def handle_data(self, data): 
        
                   self.text.append(data)

The search plugin already does that here:

mkdocs-material/src/plugins/search/plugin.py

Lines 383 to 388 in 4f8081c

    
           # Tags to skip 
        
           self.skip = set([ 
        
               "object",                  # Objects 
        
               "script",                  # Scripts 
        
               "style"                    # Styles 
        
           ])

mkdocs-material/src/plugins/search/plugin.py

Lines 409 to 414 in 4f8081c

    
           # Ignore self-closing tags 
        
           el = Element(tag, attrs) 
        
           if not tag in void: 
        
               self.context.append(el) 
        
           else: 
        
               return

mkdocs-material/src/plugins/search/plugin.py

Lines 513 to 516 in 4f8081c

    
           # Called for the text contents of each tag 
        
           def handle_data(self, data): 
        
               if self.skip.intersection(self.context): 
        
                   return

sisp · 2024-07-16T13:21:26Z

Thanks for the very helpful pointers to the search plugin! I've submitted a PR.

squidfunk · 2024-07-16T13:49:28Z

Keeping open until released.

squidfunk · 2024-07-23T07:15:56Z

Released as part of 9.5.30.

squidfunk added the change request Issue requests a new feature or improvement label Jul 16, 2024

sisp mentioned this issue Jul 16, 2024

Fixed blog readtime calculation to ignore non-content text #7370

Merged

squidfunk closed this as completed in #7370 Jul 16, 2024

squidfunk reopened this Jul 16, 2024

squidfunk added the resolved Issue is resolved, yet unreleased if open label Jul 16, 2024

squidfunk closed this as completed Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blog readtime includes inline SVG text content #7367

Blog readtime includes inline SVG text content #7367

sisp commented Jul 16, 2024 •

edited

Loading

squidfunk commented Jul 16, 2024

sisp commented Jul 16, 2024

squidfunk commented Jul 16, 2024

squidfunk commented Jul 23, 2024

Blog readtime includes inline SVG text content #7367

Blog readtime includes inline SVG text content #7367

Comments

sisp commented Jul 16, 2024 • edited Loading

Context

Bug description

Related links

Reproduction

Steps to reproduce

Browser

Before submitting

squidfunk commented Jul 16, 2024

sisp commented Jul 16, 2024

squidfunk commented Jul 16, 2024

squidfunk commented Jul 23, 2024

sisp commented Jul 16, 2024 •

edited

Loading