Fix/4459/excerpt formatting ignored #9716

nadrane · 2018-11-05T17:46:51Z

This is the beginnings of a fix for #4459

Right now, the fix works when excerpt_separator is provided. I have not yet implemented a fix for when the excerpt is generated using the pruneLength setting, though I have thoughts on how to solve it. My plan to is do a pre-order traversal over the html AST, accumulating nodes until the pruneLength is reached.

I would love some preliminary feedback to get a sense of how the contributors feel about this PR so far. Here are some of my initial thoughts:

The work in the excerpt resolve function is a little hacky. We are directly invoking a variety of functions get our HTML. I'm curious if anyone has a more streamlined approach for this.
We are invoking getMarkdownAST (previously getAST) twice, once for the excerpt and once for the entire page. I'm worried that this is going to break non-idempotent existing plugins since they will now run twice.
I tore apart the getAST function a bit and think I did a good job simplifying it. The big change is that we don't generate multiple promises anymore. i think this goes a long way towards improving readability.

I'd love to hear people's thoughts.

pieh · 2018-11-05T20:08:41Z

Did you check #5586 which tackles same problem?

2. We are invoking getMarkdownAST (previously getAST) twice, once for the excerpt and once for the entire page. I'm worried that this is going to break non-idempotent existing plugins since they will now run twice.

There's no way around it I think. Unless excerpt separator is visible in markdown AST, then we could use same AST and break manually on separator (but I don't think it's there)

3. I tore apart the getAST function a bit and think I did a good job simplifying it. The big change is that we don't generate multiple promises anymore. i think this goes a long way towards improving readability.

🙏 At some point I would like to have cache.wrap that would exactly what new version of getAST does but hidden from end user (kind of like _.memoize), so you can wrap your async function in cache.

nadrane · 2018-11-05T20:48:04Z

There's no way around it I think. Unless excerpt separator is visible in markdown AST, then we could use same AST and break manually on separator (but I don't think it's there)

I can look to see if the separator is in the AST...

One alternative is that we could just not run plugins on the extract. This is an unfortunate limitation, though it's definitely better than the current situation and might be better than potentially breaking all/half of the plugins that use the gatsby-transformer-remark API. How would you feel about this approach?

nadrane · 2018-11-05T22:50:51Z

So the comment does make its way into the markdown AST. With that said, removing it (and everything that follows it) is not trivial. At first I thought it would be simple, but as I looked at the markdown AST data structure a bit closer, I realized that it's storing position relative to the original text. Here is an example:

node { type: 'root',
        children:
         [ { type: 'paragraph', children: [Array], position: [Object] },
           { type: 'html', value: '<!-- end -->', position: [Object] },
           { type: 'paragraph', children: [Array], position: [Object] },
           { type: 'paragraph', children: [Array], position: [Object] } ],
        position:
         { start: { line: 1, column: 1, offset: 0 },
           end: { line: 6, column: 5, offset: 1013 } } }

You can see that the AST stores the offsets for the beginning and ending of every node. If I were to remove a subset of nodes from the AST, I would need to update all these positions. Unfortunately, as far as I can tell, there aren't any helper utilities available online to make these AST updates for me.

I think I'm going to continue with the current approach and rely on calling getAST twice.

… is provided

nadrane · 2018-11-10T04:56:09Z

This PR is mostly ready aside from updating some snapshots. The code should be pretty much finalized. The tests need a little help.

Let me give you an overview of what I've done

I was able to successfully parse the markdown AST for the excerpt_separator, allowing us to avoid calling getAST twice and risking calling plugins multiple times. I did this by placing the excerpt_separator on the markdownNode. I couldn’t figure out another way to make it accessible to the extend-node-type file. If there is a better strategy, I’d love to learn about it.
I had to write quite a bit of code to walk the html AST. Given the amount of caching similar prior code does, I got the impression that it is particularly performance sensitive. I’m doing quite a bit of tree traversal, sometimes invoking the same function multiple times. This could be optimized with some simple memoization if you think that would be valuable. My suspicion is it’s probably fine without it, though (unless maybe an excerpt were particularly long and full of many nodes).
I wasn’t sure how to get the gatsby-node tests passing again after changing the way excerpts are calculating when an excerpt_separator is present. These tests are clearly not calling the extend-node-type code, where the excerpt_separator is now handled, causing 2 tests to fail. I would love some guidance on this. Or, if it’s ok, I could also just remove the tests, though it’s not clear to me if we would then be sacrificing some degree of test coverage
The changes are slightly breaking. Even if a user previously never used markdown with their excerpts, we at the very least wrap ever excerpt in a <p> tag. This could definitely break styling on existing websites. I don't really see another option, however

Looking forward to hearing your thoughts

pieh · 2018-11-10T13:41:24Z

I couldn’t figure out another way to make it accessible to the extend-node-type file. If there is a better strategy, I’d love to learn about it.

You should have access to it via pluginOptions.excerpt_separator (to not copy it to every markdown node)

3. I wasn’t sure how to get the gatsby-node tests passing again after changing the way excerpts are calculating when an excerpt_separator is present. These tests are clearly not calling the extend-node-type code, where the excerpt_separator is now handled, causing 2 tests to fail. I would love some guidance on this. Or, if it’s ok, I could also just remove the tests, though it’s not clear to me if we would then be sacrificing some degree of test coverage

4. The changes are slightly breaking. Even if a user previously never used markdown with their excerpts, we at the very least wrap ever excerpt in a <p> tag. This could definitely break styling on existing websites. I don't really see another option, however

Yeah, this would be breaking change so would need major version bump (if merged as-is). Will think if we should do intermediate step of keeping old field (and deprecate it) and add new field (i.e excertptHtml with your code behind it). There's lot of content using <p>${markdownRemark.frontmatter.excerpt}</p> around and with this change we would need to use dangerouslySetInnerHTML, so transition period would be needed here.

nadrane · 2018-11-10T14:54:10Z

You should have access to it via pluginOptions.excerpt_separator (to not copy it to every markdown node)

I was able to access the excerpt_separator using pluginOptions in the on-node-create file but not in the extend-node-type file. In the latter file it was empty. It sounds like you're saying this is not the expected behavior.

Yeah, this would be breaking change so would need major version bump (if merged as-is). Will think if we should do intermediate step of keeping old field (and deprecate it) and add new field (i.e excertptHtml with your code behind it). There's lot of content using
${markdownRemark.frontmatter.excerpt}
around and with this change we would need to use dangerouslySetInnerHTML, so transition period would be needed here.

Let me know how you'd like to approach it. I'm fine creating a new GraphQL field in the interim.

pieh · 2018-11-10T20:06:27Z

I was able to access the excerpt_separator using pluginOptions in the on-node-create file but not in the extend-node-type file. In the latter file it was empty. It sounds like you're saying this is not the expected behavior.

was it happening in jests tests or when you were running gatsby with your changes applied?

nadrane · 2018-11-11T14:06:00Z

was it happening in jests tests or when you were running gatsby with your changes applied?

Yeah it was happening through Jest tests. Sounds like they aren't set up quite correctly?

pieh · 2018-11-11T15:05:09Z

Yeah it was happening through Jest tests. Sounds like they aren't set up quite correctly?

Seems like it -

gatsby/packages/gatsby-transformer-remark/src/__tests__/extend-node.js

Lines 34 to 36 in 76037bf

    
           { 
        
             plugins: [], 
        
           }

here we would need to pass pluginOptions (which includes excerpt_separator)

I guess this wasn't needed before because pluginOptions weren't used in tested code

nadrane · 2018-11-12T18:42:58Z

Ah, that makes sense. Thank you.

As far as a final decision on this PR goes, are you settled on creating a excerptHTML field and leaving the existing excerpt field as-is? If so, I've got a plane ride in a few hours and will probably do the work then.

nadrane · 2018-11-16T03:55:12Z

@pieh Hey, just checking in, what are you thinking for a final decision on this one?

DSchau

Few notes:

I think this is a great change. I've personally run into some weirdness with this, so I make an excerpt field on my frontmatter that I pull from first and then fallback to excerpt made available automatically on the markdown node.

I think there are a few approaches here, all valid/useful in different ways. The overarching principle here should be that users could be relying on old behavior (e.g. using excerpt for plain text, and then getting HTML) so we can't do this cleanly without a breaking change and a major version bump which is OK but not necessarily ideal.

To avoid that, I have a few ideas:

Use something like remove-markdown to remove markdown characters
- This should (🤞) not really be a breaking change, and users should just see a benefit from it; unless they were relying on that field being markdown, which seems odd
Expose a separate field (excerptHtml) and keep excerpt as is
Expose arguments, e.g. format: 'plain' (this should be the default) and format: 'html'

I think I prefer the last approach. What are your thoughts @nadrane?

nadrane · 2018-11-20T16:44:44Z

it sounds like you have a wayyy better understanding of the use cases of this field than I do! How are excerpts used for meta tags or SEO, just out of curiosity?

So it sounds like we're in alignment to go with option 3, where we have an excerpt field with a format option. I assume we should keep the default as plaintext because, as you explained, there's no clear winner between plain vs. html and for backwards compatibility reasons.

I'll modify the PR with these changes in mind! looking forward to getting this merged!

DSchau · 2018-11-20T16:45:42Z

@nadrane here's an example I've used in the past :) https://github.com/DSchau/blog/blob/master/src/templates/blog-post.js#L32-L34

DSchau · 2018-11-20T16:51:19Z

@nadrane I'm looking forward to getting this merged too! Thanks for being patient and working with us on this, I think this is a great change 🎉

nadrane · 2018-11-25T22:25:08Z

I'm just about done with this PR. I'm struggling to test code related to the excerpt_separator. I need to understand how plugin options are passed in, so I've been trying quite unsuccessfully to use this modified package inside a normal Gatsby project (up until now all my testing has been inside unit tests)

I've tried a couple things

yarn link causes a bunch of GraphQL errors related to declaring the same fields multiple times
yarn add <localfilepath> simply ignores my code changes. Somehow it still is using the original gatsby-transformer-remark code.

How the heck am I supposed to integrate this plugin with the greater system? Are there are docs around?

DSchau · 2018-11-26T13:53:25Z

@nadrane awesome! As far as testing the changes locally - you better believe it!

Check out gatsby-dev-cli, specifically check out these instructions for getting up and running with it. Let us know if you need any help!

DSchau · 2018-11-26T13:56:12Z

packages/gatsby-transformer-remark/src/extend-node-type.js

@@ -373,8 +389,59 @@ module.exports = (
            type: GraphQLBoolean,
            defaultValue: false,
          },
+          format: {
+            type: GraphQLString,


Do we want to use an enum type here? It seems like only plain and html are options, and designating string could lead to subtle bugs (e.g. typos or something!).

Definitely. Great catch.

nadrane · 2018-11-27T04:39:48Z

alright, I think this is finally ready for review! Thanks for the gatsby-dev-cli tips. Worked like a charm

DSchau · 2018-11-27T17:05:06Z

@nadrane pulling this down to test right now. Awesome job thus far 👌

DSchau

Validated with this repo, particularly see this file which I used to validate the approach.

This looks great and I think is a nice, new improvement while also maintaining backwards compatibility.

One final thing I think we should do is actually document the feature. I'd recommend the README in gatsby-transformer-remark. Sound like something we can do? Once that's done I'll go ahead and merge!

nadrane · 2018-11-27T21:24:37Z

I think we're ready!

DSchau · 2018-11-27T21:25:48Z

@nadrane small little merge conflict! Want to take a look? Otherwise I can iron it out!

nadrane · 2018-11-27T21:32:36Z

@DSchau not sure how that happened. I fixed it :)

DSchau

Looks great. Thanks for all your work on this 💪

gatsbot · 2018-11-27T21:56:15Z

Holy buckets, @nadrane — we just merged your PR to Gatsby! 💪💜

Gatsby is built by awesome people like you. Let us say “thanks” in two ways:

We’d like to send you some Gatsby swag. As a token of our appreciation, you can go to the Gatsby Swag Store and log in with your GitHub account to get a coupon code good for one free piece of swag. (Currently we’ve got a couple t-shirts available, plus some socks that are really razzing our berries right now.)
We just invited you to join the Gatsby organization on GitHub. This will add you to our team of maintainers. Accept the invite by visiting https://github.com/orgs/gatsbyjs/invitation. By joining the team, you’ll be able to label issues, review pull requests, and merge approved pull requests.

If there’s anything we can do to help, please don’t hesitate to reach out to us: tweet at @gatsbyjs and we’ll come a-runnin’.

Thanks again!

DSchau · 2018-11-27T21:58:05Z

Successfully published:
 - gatsby-image@2.0.21
 - **gatsby-transformer-remark@2.1.13**
 - gatsby@2.0.56

This is the beginnings of a fix for gatsbyjs#4459 Right now, the fix works when excerpt_separator is provided. I have not yet implemented a fix for when the excerpt is generated using the `pruneLength` setting, though I have thoughts on how to solve it. My plan to is do a pre-order traversal over the html AST, accumulating nodes until the pruneLength is reached. I would love some preliminary feedback to get a sense of how the contributors feel about this PR so far. Here are some of my initial thoughts: 1. The work in the `excerpt` resolve function is a little hacky. We are directly invoking a variety of functions get our HTML. I'm curious if anyone has a more streamlined approach for this. 2. We are invoking `getMarkdownAST` (previously getAST) twice, once for the excerpt and once for the entire page. I'm worried that this is going to break non-idempotent existing plugins since they will now run twice. 3. I tore apart the `getAST` function a bit and think I did a good job simplifying it. The big change is that we don't generate multiple promises anymore. i think this goes a long way towards improving readability. I'd love to hear people's thoughts.

nadrane added 6 commits November 5, 2018 10:54

extract markdownNodeAST function

505a12a

use async/await better

099fcd0

delete unncessary promise to make code more readable

dea318a

extract remark target parameter

87a421d

update tests and return html excerpt

af59b30

update snapshots

385f2ab

nadrane added 11 commits November 5, 2018 18:34

begin refactoring code to grab exerpts in the case where no separator…

1613bbe

… is provided

make headway on ast cloning

b431269

abandon generalizing preorder traversal. tree duplication works

7b40c31

pruning mostly works

4a31459

get the complicated test expectation correct

440db7a

rename complicated test

c7b9cdd

refactor out tree cloning

cdc2324

make progress towards pruning last text node

15932e7

tests passing

235daa2

don't invoke plugin functions a second time if excerpt_separator used

830e6fc

code works

76037bf

migrate to excerpt and exerptHTML fields

97cfabb

DSchau reviewed Nov 19, 2018

View reviewed changes

nadrane added 4 commits November 25, 2018 14:15

merge master

014ab92

print graphql errors properly

9fc3aa2

excerpt printing in html

df3dcf2

more progress

2249f7c

DSchau reviewed Nov 26, 2018

View reviewed changes

nadrane added 2 commits November 26, 2018 22:21

final changes to excerpt separator handling

4e4c498

use enum type for format arg

64152c7

undo accidental jest config changes

4db7c57

DSchau reviewed Nov 27, 2018

View reviewed changes

update README

e755e9b

Merge branch 'master' into fix/4459/excerpt-formatting-ignored

a8c1f05

Update README.md

dc120d9

DSchau approved these changes Nov 27, 2018

View reviewed changes

DSchau merged commit fbe4fcb into gatsbyjs:master Nov 27, 2018

nadrane mentioned this pull request Nov 29, 2018

anchor tags not clickable in excerpts #10194

Closed

pieh mentioned this pull request Nov 29, 2018

chore(gatsby-transformer-remark): update gitignore #10199

Merged

LekoArts mentioned this pull request Dec 4, 2018

Bugfix 4459 #5586

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/4459/excerpt formatting ignored #9716

Fix/4459/excerpt formatting ignored #9716

nadrane commented Nov 5, 2018

pieh commented Nov 5, 2018

nadrane commented Nov 5, 2018

nadrane commented Nov 5, 2018 •

edited

Loading

nadrane commented Nov 10, 2018

pieh commented Nov 10, 2018

nadrane commented Nov 10, 2018

pieh commented Nov 10, 2018

nadrane commented Nov 11, 2018

pieh commented Nov 11, 2018 •

edited

Loading

nadrane commented Nov 12, 2018

nadrane commented Nov 16, 2018

DSchau left a comment

nadrane commented Nov 20, 2018

DSchau commented Nov 20, 2018

DSchau commented Nov 20, 2018

nadrane commented Nov 25, 2018

DSchau commented Nov 26, 2018

DSchau Nov 26, 2018

nadrane Nov 26, 2018

nadrane commented Nov 27, 2018

DSchau commented Nov 27, 2018

DSchau left a comment

nadrane commented Nov 27, 2018

DSchau commented Nov 27, 2018

nadrane commented Nov 27, 2018

DSchau left a comment

gatsbot bot commented Nov 27, 2018

DSchau commented Nov 27, 2018

Fix/4459/excerpt formatting ignored #9716

Fix/4459/excerpt formatting ignored #9716

Conversation

nadrane commented Nov 5, 2018

pieh commented Nov 5, 2018

nadrane commented Nov 5, 2018

nadrane commented Nov 5, 2018 • edited Loading

nadrane commented Nov 10, 2018

pieh commented Nov 10, 2018

nadrane commented Nov 10, 2018

pieh commented Nov 10, 2018

nadrane commented Nov 11, 2018

pieh commented Nov 11, 2018 • edited Loading

nadrane commented Nov 12, 2018

nadrane commented Nov 16, 2018

DSchau left a comment

Choose a reason for hiding this comment

nadrane commented Nov 20, 2018

DSchau commented Nov 20, 2018

DSchau commented Nov 20, 2018

nadrane commented Nov 25, 2018

DSchau commented Nov 26, 2018

DSchau Nov 26, 2018

Choose a reason for hiding this comment

nadrane Nov 26, 2018

Choose a reason for hiding this comment

nadrane commented Nov 27, 2018

DSchau commented Nov 27, 2018

DSchau left a comment

Choose a reason for hiding this comment

nadrane commented Nov 27, 2018

DSchau commented Nov 27, 2018

nadrane commented Nov 27, 2018

DSchau left a comment

Choose a reason for hiding this comment

gatsbot bot commented Nov 27, 2018

DSchau commented Nov 27, 2018

nadrane commented Nov 5, 2018 •

edited

Loading

pieh commented Nov 11, 2018 •

edited

Loading