Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

React too eagerly sanitizes HTML entities #6718

Closed
ivan-kleshnin opened this issue Mar 19, 2019 · 7 comments
Closed

React too eagerly sanitizes HTML entities #6718

ivan-kleshnin opened this issue Mar 19, 2019 · 7 comments
Labels
Upstream Related to using Next.js with a third-party dependency. (e.g., React, UI/icon libraries, etc.).

Comments

@ivan-kleshnin
Copy link
Contributor

ivan-kleshnin commented Mar 19, 2019

Now-2 and Next-8 require us to drop server.js-based approach to serve SSR in production, and replace it with lambdas and route configs. I fully understand the reasoning behind this and I don't question it. Hovewer, the server.js file was useful not only to apply routing but to inject middlewares! Middlewares were necessary, for example, to fix a bunch of annoying problems with React.

  1. React title encodes HTML entities

facebook/react#13838
facebook/react#3879

See fix-1 below.

  1. Linebreaks in meta tags and stuff like that

See fix-2 below.

import Express from "express"
import Interceptor from "express-interceptor"
import {AllHtmlEntities} from "html-entities"

let ent = new AllHtmlEntities()
let metaContentFixer = Interceptor((req, res) => ({
  isInterceptable() {
    return /text\/html/.test(res.get("Content-Type"))
  },
  intercept(body, send) {
    send(normalizeHTML(body.toString()))
  }
}))

app
  .prepare()
  .then(() => {
    let server = Express()
    server.use(metaContentFixer)
    ...
  })

function normalizeHTML(html) {
  return html
    // Fix SEO title
    .replace(/<title[^>]*>([^<]*)<\/title>/, (match, content) => {
      return match.replace(content, ent.decode(content))
    }) // fix-1
    // Remove linebreaks in meta contents
    .replace(/content="([^"]+)"/g, (match, content) => {
      return `content="${ent.decode(content).replace(/(\r\n|\n|\r)+/g, " ")}"`
    }) // fix-2
}

You can say "meh, it's just React bugs". The problem is that such bugs can be very long-lasting and quite detremental to SEO (ruining og attributes for example).

And the list of such issues can be expanded. Not necessary NextJS-related but I remember fixing HTML after ReactRouter + Helmet and so on. The bottom line is that similar issues and answers like "You can only fix it in HTML" are regular at StackOverflow.

server.js allowed us to inject whatever middlewares we wanted. Unless I miss something, Now-2 + Next-8 do not leave any option to apply custom middlewares. It's quite a limitation, you know.

@Timer Timer added Type: Bug Upstream Related to using Next.js with a third-party dependency. (e.g., React, UI/icon libraries, etc.). labels Mar 19, 2019
@Timer Timer changed the title server.js replacement React too eagerly sanitizes HTML entities Mar 19, 2019
@ivan-kleshnin
Copy link
Contributor Author

ivan-kleshnin commented Mar 19, 2019

@Timer I think the new title is slightly misleading. I used React bug just as a demonstration of a bigger problem. I don't think it's really possible to scale (in a business sense) SSR app without middlewares. @now/node supports middlewares. @now/next should somehow support them as well.

@ivan-kleshnin
Copy link
Contributor Author

ivan-kleshnin commented Mar 20, 2019

Another consideration.

With next.config.js:routes we still need server.js for a local dev. Which requires either to duplicate code or to add a transformation step from JSON to Express routes. With middlewares, the situation complicates. Obviously, we'll need to share some middlewares between dev and prod, but not the others... Which means we need to either revert to server.js stage or... use a special lambda framework.

AWS ended up with Serverless. As Zeit is already on the path of abandoning Docker and the benefit of "No need to rewrite code" is no longer a selling point, it seems like the next logical step to me.

@Timer
Copy link
Member

Timer commented May 24, 2019

Closing this because this is semantically valid HTML -- can you please provide a case where this doesn't work?

https://codesandbox.io/s/6xn18

For example, the following code opens the link as you'd expect without &amp; in the slug:

<a target="_blank" href="http://example.com/?test=1&amp;test2=test">Testing</a>

Also, "Remove linebreaks in meta contents" sounds better served by fixing it in the place you're rendering a <meta /> tag.

@Timer Timer closed this as completed May 24, 2019
@ivan-kleshnin
Copy link
Contributor Author

ivan-kleshnin commented May 24, 2019

Consider SEO titles where you want

<title>X & Y</title>

instead of

<title>X &amp;</title>

because:

  1. the latter can be treated differently by search engines
  2. SEO titles have effective length limitation

Although many people claim the &amp; is ok in SEO titles...
I don't have time to investigate this further so if you think it's not a bug – I'm fine.

@ivan-kleshnin
Copy link
Contributor Author

ivan-kleshnin commented May 24, 2019

For the record, this issue started when we saw

Foo &amp; Bar

like results in the Google SERP (search engine result page)!

By logic, Google has to decode HTML entities, yes. But somehow it didn't (at least that time).

@Timer
Copy link
Member

Timer commented May 27, 2019

This should probably be filed as a bug with Google -- if this still reproduces we can likely help ping people.

@balazsorban44
Copy link
Member

This issue has been automatically locked due to no recent activity. If you are running into a similar issue, please create a new issue with the steps to reproduce. Thank you.

@vercel vercel locked as resolved and limited conversation to collaborators Jan 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Upstream Related to using Next.js with a third-party dependency. (e.g., React, UI/icon libraries, etc.).
Projects
None yet
Development

No branches or pull requests

3 participants