Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/4459/excerpt formatting ignored #9716

Merged
merged 28 commits into from
Nov 27, 2018

Conversation

nadrane
Copy link
Contributor

@nadrane nadrane commented Nov 5, 2018

This is the beginnings of a fix for #4459

Right now, the fix works when excerpt_separator is provided. I have not yet implemented a fix for when the excerpt is generated using the pruneLength setting, though I have thoughts on how to solve it. My plan to is do a pre-order traversal over the html AST, accumulating nodes until the pruneLength is reached.

I would love some preliminary feedback to get a sense of how the contributors feel about this PR so far. Here are some of my initial thoughts:

  1. The work in the excerpt resolve function is a little hacky. We are directly invoking a variety of functions get our HTML. I'm curious if anyone has a more streamlined approach for this.

  2. We are invoking getMarkdownAST (previously getAST) twice, once for the excerpt and once for the entire page. I'm worried that this is going to break non-idempotent existing plugins since they will now run twice.

  3. I tore apart the getAST function a bit and think I did a good job simplifying it. The big change is that we don't generate multiple promises anymore. i think this goes a long way towards improving readability.

I'd love to hear people's thoughts.

@pieh
Copy link
Contributor

pieh commented Nov 5, 2018

Did you check #5586 which tackles same problem?

2. We are invoking getMarkdownAST (previously getAST) twice, once for the excerpt and once for the entire page. I'm worried that this is going to break non-idempotent existing plugins since they will now run twice.

There's no way around it I think. Unless excerpt separator is visible in markdown AST, then we could use same AST and break manually on separator (but I don't think it's there)

3. I tore apart the getAST function a bit and think I did a good job simplifying it. The big change is that we don't generate multiple promises anymore. i think this goes a long way towards improving readability.

🙏 At some point I would like to have cache.wrap that would exactly what new version of getAST does but hidden from end user (kind of like _.memoize), so you can wrap your async function in cache.

@nadrane
Copy link
Contributor Author

nadrane commented Nov 5, 2018

There's no way around it I think. Unless excerpt separator is visible in markdown AST, then we could use same AST and break manually on separator (but I don't think it's there)

I can look to see if the separator is in the AST...

One alternative is that we could just not run plugins on the extract. This is an unfortunate limitation, though it's definitely better than the current situation and might be better than potentially breaking all/half of the plugins that use the gatsby-transformer-remark API. How would you feel about this approach?

@nadrane
Copy link
Contributor Author

nadrane commented Nov 5, 2018

So the comment does make its way into the markdown AST. With that said, removing it (and everything that follows it) is not trivial. At first I thought it would be simple, but as I looked at the markdown AST data structure a bit closer, I realized that it's storing position relative to the original text. Here is an example:

node { type: 'root',
        children:
         [ { type: 'paragraph', children: [Array], position: [Object] },
           { type: 'html', value: '<!-- end -->', position: [Object] },
           { type: 'paragraph', children: [Array], position: [Object] },
           { type: 'paragraph', children: [Array], position: [Object] } ],
        position:
         { start: { line: 1, column: 1, offset: 0 },
           end: { line: 6, column: 5, offset: 1013 } } }

You can see that the AST stores the offsets for the beginning and ending of every node. If I were to remove a subset of nodes from the AST, I would need to update all these positions. Unfortunately, as far as I can tell, there aren't any helper utilities available online to make these AST updates for me.

I think I'm going to continue with the current approach and rely on calling getAST twice.

@nadrane
Copy link
Contributor Author

nadrane commented Nov 10, 2018

This PR is mostly ready aside from updating some snapshots. The code should be pretty much finalized. The tests need a little help.

Let me give you an overview of what I've done

  1. I was able to successfully parse the markdown AST for the excerpt_separator, allowing us to avoid calling getAST twice and risking calling plugins multiple times. I did this by placing the excerpt_separator on the markdownNode. I couldn’t figure out another way to make it accessible to the extend-node-type file. If there is a better strategy, I’d love to learn about it.

  2. I had to write quite a bit of code to walk the html AST. Given the amount of caching similar prior code does, I got the impression that it is particularly performance sensitive. I’m doing quite a bit of tree traversal, sometimes invoking the same function multiple times. This could be optimized with some simple memoization if you think that would be valuable. My suspicion is it’s probably fine without it, though (unless maybe an excerpt were particularly long and full of many nodes).

  3. I wasn’t sure how to get the gatsby-node tests passing again after changing the way excerpts are calculating when an excerpt_separator is present. These tests are clearly not calling the extend-node-type code, where the excerpt_separator is now handled, causing 2 tests to fail. I would love some guidance on this. Or, if it’s ok, I could also just remove the tests, though it’s not clear to me if we would then be sacrificing some degree of test coverage

  4. The changes are slightly breaking. Even if a user previously never used markdown with their excerpts, we at the very least wrap ever excerpt in a <p> tag. This could definitely break styling on existing websites. I don't really see another option, however

Looking forward to hearing your thoughts

@pieh
Copy link
Contributor

pieh commented Nov 10, 2018

I couldn’t figure out another way to make it accessible to the extend-node-type file. If there is a better strategy, I’d love to learn about it.

You should have access to it via pluginOptions.excerpt_separator (to not copy it to every markdown node)

3. I wasn’t sure how to get the gatsby-node tests passing again after changing the way excerpts are calculating when an excerpt_separator is present. These tests are clearly not calling the extend-node-type code, where the excerpt_separator is now handled, causing 2 tests to fail. I would love some guidance on this. Or, if it’s ok, I could also just remove the tests, though it’s not clear to me if we would then be sacrificing some degree of test coverage

4. The changes are slightly breaking. Even if a user previously never used markdown with their excerpts, we at the very least wrap ever excerpt in a <p> tag. This could definitely break styling on existing websites. I don't really see another option, however

Yeah, this would be breaking change so would need major version bump (if merged as-is). Will think if we should do intermediate step of keeping old field (and deprecate it) and add new field (i.e excertptHtml with your code behind it). There's lot of content using <p>${markdownRemark.frontmatter.excerpt}</p> around and with this change we would need to use dangerouslySetInnerHTML, so transition period would be needed here.

@nadrane
Copy link
Contributor Author

nadrane commented Nov 10, 2018

You should have access to it via pluginOptions.excerpt_separator (to not copy it to every markdown node)

I was able to access the excerpt_separator using pluginOptions in the on-node-create file but not in the extend-node-type file. In the latter file it was empty. It sounds like you're saying this is not the expected behavior.

Yeah, this would be breaking change so would need major version bump (if merged as-is). Will think if we should do intermediate step of keeping old field (and deprecate it) and add new field (i.e excertptHtml with your code behind it). There's lot of content using

${markdownRemark.frontmatter.excerpt}

around and with this change we would need to use dangerouslySetInnerHTML, so transition period would be needed here.

Let me know how you'd like to approach it. I'm fine creating a new GraphQL field in the interim.

@pieh
Copy link
Contributor

pieh commented Nov 10, 2018

I was able to access the excerpt_separator using pluginOptions in the on-node-create file but not in the extend-node-type file. In the latter file it was empty. It sounds like you're saying this is not the expected behavior.

was it happening in jests tests or when you were running gatsby with your changes applied?

@nadrane
Copy link
Contributor Author

nadrane commented Nov 11, 2018

was it happening in jests tests or when you were running gatsby with your changes applied?

Yeah it was happening through Jest tests. Sounds like they aren't set up quite correctly?

@pieh
Copy link
Contributor

pieh commented Nov 11, 2018

Yeah it was happening through Jest tests. Sounds like they aren't set up quite correctly?

Seems like it -

here we would need to pass pluginOptions (which includes excerpt_separator)

I guess this wasn't needed before because pluginOptions weren't used in tested code

@nadrane
Copy link
Contributor Author

nadrane commented Nov 12, 2018

Ah, that makes sense. Thank you.

As far as a final decision on this PR goes, are you settled on creating a excerptHTML field and leaving the existing excerpt field as-is? If so, I've got a plane ride in a few hours and will probably do the work then.

@nadrane
Copy link
Contributor Author

nadrane commented Nov 16, 2018

@pieh Hey, just checking in, what are you thinking for a final decision on this one?

Copy link
Contributor

@DSchau DSchau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few notes:

I think this is a great change. I've personally run into some weirdness with this, so I make an excerpt field on my frontmatter that I pull from first and then fallback to excerpt made available automatically on the markdown node.

I think there are a few approaches here, all valid/useful in different ways. The overarching principle here should be that users could be relying on old behavior (e.g. using excerpt for plain text, and then getting HTML) so we can't do this cleanly without a breaking change and a major version bump which is OK but not necessarily ideal.

To avoid that, I have a few ideas:

  • Use something like remove-markdown to remove markdown characters
    • This should (🤞) not really be a breaking change, and users should just see a benefit from it; unless they were relying on that field being markdown, which seems odd
  • Expose a separate field (excerptHtml) and keep excerpt as is
  • Expose arguments, e.g. format: 'plain' (this should be the default) and format: 'html'

I think I prefer the last approach. What are your thoughts @nadrane?

@nadrane
Copy link
Contributor Author

nadrane commented Nov 20, 2018

it sounds like you have a wayyy better understanding of the use cases of this field than I do! How are excerpts used for meta tags or SEO, just out of curiosity?

So it sounds like we're in alignment to go with option 3, where we have an excerpt field with a format option. I assume we should keep the default as plaintext because, as you explained, there's no clear winner between plain vs. html and for backwards compatibility reasons.

I'll modify the PR with these changes in mind! looking forward to getting this merged!

@DSchau
Copy link
Contributor

DSchau commented Nov 20, 2018

@nadrane here's an example I've used in the past :) https://github.com/DSchau/blog/blob/master/src/templates/blog-post.js#L32-L34

@DSchau
Copy link
Contributor

DSchau commented Nov 20, 2018

@nadrane I'm looking forward to getting this merged too! Thanks for being patient and working with us on this, I think this is a great change 🎉

@nadrane
Copy link
Contributor Author

nadrane commented Nov 25, 2018

I'm just about done with this PR. I'm struggling to test code related to the excerpt_separator. I need to understand how plugin options are passed in, so I've been trying quite unsuccessfully to use this modified package inside a normal Gatsby project (up until now all my testing has been inside unit tests)

I've tried a couple things

  1. yarn link causes a bunch of GraphQL errors related to declaring the same fields multiple times
  2. yarn add <localfilepath> simply ignores my code changes. Somehow it still is using the original gatsby-transformer-remark code.

How the heck am I supposed to integrate this plugin with the greater system? Are there are docs around?

@DSchau
Copy link
Contributor

DSchau commented Nov 26, 2018

@nadrane awesome! As far as testing the changes locally - you better believe it!

Check out gatsby-dev-cli, specifically check out these instructions for getting up and running with it. Let us know if you need any help!

@@ -373,8 +389,59 @@ module.exports = (
type: GraphQLBoolean,
defaultValue: false,
},
format: {
type: GraphQLString,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use an enum type here? It seems like only plain and html are options, and designating string could lead to subtle bugs (e.g. typos or something!).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. Great catch.

@nadrane
Copy link
Contributor Author

nadrane commented Nov 27, 2018

alright, I think this is finally ready for review! Thanks for the gatsby-dev-cli tips. Worked like a charm

@DSchau
Copy link
Contributor

DSchau commented Nov 27, 2018

@nadrane pulling this down to test right now. Awesome job thus far 👌

Copy link
Contributor

@DSchau DSchau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validated with this repo, particularly see this file which I used to validate the approach.

This looks great and I think is a nice, new improvement while also maintaining backwards compatibility.

One final thing I think we should do is actually document the feature. I'd recommend the README in gatsby-transformer-remark. Sound like something we can do? Once that's done I'll go ahead and merge!

@nadrane
Copy link
Contributor Author

nadrane commented Nov 27, 2018

I think we're ready!

@DSchau
Copy link
Contributor

DSchau commented Nov 27, 2018

@nadrane small little merge conflict! Want to take a look? Otherwise I can iron it out!

@nadrane
Copy link
Contributor Author

nadrane commented Nov 27, 2018

@DSchau not sure how that happened. I fixed it :)

Copy link
Contributor

@DSchau DSchau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks for all your work on this 💪

@DSchau DSchau merged commit fbe4fcb into gatsbyjs:master Nov 27, 2018
@gatsbot
Copy link

gatsbot bot commented Nov 27, 2018

Holy buckets, @nadrane — we just merged your PR to Gatsby! 💪💜

Gatsby is built by awesome people like you. Let us say “thanks” in two ways:

  1. We’d like to send you some Gatsby swag. As a token of our appreciation, you can go to the Gatsby Swag Store and log in with your GitHub account to get a coupon code good for one free piece of swag. (Currently we’ve got a couple t-shirts available, plus some socks that are really razzing our berries right now.)
  2. We just invited you to join the Gatsby organization on GitHub. This will add you to our team of maintainers. Accept the invite by visiting https://github.com/orgs/gatsbyjs/invitation. By joining the team, you’ll be able to label issues, review pull requests, and merge approved pull requests.

If there’s anything we can do to help, please don’t hesitate to reach out to us: tweet at @gatsbyjs and we’ll come a-runnin’.

Thanks again!

@DSchau
Copy link
Contributor

DSchau commented Nov 27, 2018

Successfully published:
 - gatsby-image@2.0.21
 - **gatsby-transformer-remark@2.1.13**
 - gatsby@2.0.56

@LekoArts LekoArts mentioned this pull request Dec 4, 2018
gpetrioli pushed a commit to gpetrioli/gatsby that referenced this pull request Jan 22, 2019
This is the beginnings of a fix for gatsbyjs#4459

Right now, the fix works when excerpt_separator is provided. I have not yet implemented a fix for when the excerpt is generated using the `pruneLength` setting, though I have thoughts on how to solve it. My plan to is do a pre-order traversal over the html AST, accumulating nodes until the pruneLength is reached.

I would love some preliminary feedback to get a sense of how the contributors feel about this PR so far. Here are some of my initial thoughts:

1. The work in the `excerpt` resolve function is a little hacky. We are directly invoking a variety of functions get our HTML. I'm curious if anyone has a more streamlined approach for this.

2. We are invoking `getMarkdownAST` (previously getAST) twice, once for the excerpt and once for the entire page. I'm worried that this is going to break non-idempotent existing plugins since they will now run twice.

3. I tore apart the `getAST` function a bit and think I did a good job simplifying it. The big change is that we don't generate multiple promises anymore. i think this goes a long way towards improving readability.

I'd love to hear people's thoughts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants