Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSS feed structure audit #2263

Closed
JohnONolan opened this issue Feb 25, 2014 · 27 comments · May be fixed by mdno/Ghost#71, amgir/Ghost#70 or fooiskandar/Ghost#26
Closed

RSS feed structure audit #2263

JohnONolan opened this issue Feb 25, 2014 · 27 comments · May be fixed by mdno/Ghost#71, amgir/Ghost#70 or fooiskandar/Ghost#26
Milestone

Comments

@JohnONolan
Copy link
Member

Let's do a bit of an audit of Ghost RSS feeds vs WP RSS feeds, because there's quite a lot of difference at the moment. Eg.

http://john.onolan.org/rss/

vs

http://blog.kissmetrics.com/feed/

I'm fine with the two being different, but I'd like to make sure we're structuring feeds differently for the right (or at least some) reasons.

Are we using the correct tags? Are we missing any?

@JohnONolan JohnONolan added this to the 0.6 milestone Feb 25, 2014
@javorszky
Copy link
Contributor

For reference: http://www.rssboard.org/rss-specification

This was referenced Mar 4, 2014
@JohnONolan
Copy link
Member Author

http://dev.ghost.org/rss/2/ - should not be a thing. RSS feed = feed, not paginated.

@stenehall
Copy link
Contributor

While I agree with JohnONolan that it feels a bit weird, both rss and atom seems to support it: http://tools.ietf.org/html/rfc5005#section-3 and http://www.ibm.com/developerworks/library/x-tipatom2/

One more thing that I found while coding on #2260, the rss feed doesn't seem to respect the admin setting for pagination but always use 15 as a page break.

@halfdan
Copy link
Contributor

halfdan commented Mar 5, 2014

@stenehall Just because it exists doesn't mean it's right to do it :) The actual question is: Who subscribes to the second page (?) of an RSS feed?

A thing that makes a lot more sense is a config option to define how many items should end up in the feed (or should it be tied to the number of posts on a page - which it probably should). There's no RSS reader that would pick up the second page of our RSS feeds - Feed readers check for new posts in the feed but never try to paginate (because there's no standard on how that would happen).

@stenehall
Copy link
Contributor

@halfdan Totally agree. Just had to google it to read up a bit on it. I fully agree that it can be removed.

@julien51
Copy link

@halfdan There are actually several people/tool that use the 2nd page (and further) when available. I'm thinking about import/export tools for example. But some traditional feed readers do that as well so they can serve "out of band" data to their users when they do searches for example. Now, I understand this is a rather "minor" feature.

@halfdan
Copy link
Contributor

halfdan commented Mar 11, 2014

@julien51 How do feed readers know that there's a second page? For import/export we have the ugly debug tools - so it's easy to export your blog and import it somewhere else.

@julien51
Copy link

@halfdan 2 ways:

  • there is an RFC
  • trial and error. The CMS landscape is actually pretty concentrated. If you look into the code of the top 5 of them, you can quickly cover an large part of that market.

And of course import/export with other tools, but in practice not all combination work well, and RSS is just a universal (yet often time lossy) way of achieving this. My memory fails me but IIRC a lot of Posterous importers where based on RSS because that just works =)

But again, I understand (and agree) this is a very small subset of what RSS is used for. I just wanted to give my couple cents here and maybe leave that discussion open for a much later time when everything with a higher priority has been covered).

@haneefmubarak
Copy link

After seeing #2332, I just wanted to put in my two cents and say that it appears to be vastly more common to use RSS feeds for summaries as opposed to full articles, especially since full articles can consume much more bandwidth in comparison to short summaries. Here are a few websites that show this example:

I could keep going, but I think this covers it rather well.

@haneefmubarak
Copy link

Perhaps there could be an option in the settings or a secondary feed for those who do want full story feeds?

@ErisDS
Copy link
Member

ErisDS commented Apr 21, 2014

@haneefmubarak options are the opposite of what Ghost is about. As expressed in #2332 (which would have been a more suitable place for your comment, btw) this is the kind of thing an App could/should do.

@ErisDS ErisDS mentioned this issue Jul 1, 2014
26 tasks
@ErisDS ErisDS modified the milestones: 0.5.x Feature Release, 0.5 Multi-user Jul 8, 2014
@ErisDS
Copy link
Member

ErisDS commented Aug 19, 2014

#2777 is a valid issue with our current RSS implementation

@ErisDS
Copy link
Member

ErisDS commented Aug 19, 2014

This audit has been open / pending for a while. Part of the problem we have is that there is no good alternative library for generating RSS feeds, and the existing one is broken in places and not very extensible.

There has been talk of a new library in the works by @halfdan but I think it may be too much work. Wondering if there is a simpler solution, just using a .hbs template which contains XML?

@ErisDS
Copy link
Member

ErisDS commented Mar 23, 2015

I've taken a brief look at this today, and identified a few things that we definitely need to (and also can quite easily) fix:

  • Content-Type header
    I changed this a little while ago, because Chrome was doing something awful with text/xml that made the page look like an error to many users. It seems that Chrome has since bucked its ideas up and we should go back to text/xml.
  • Content vs Description
    We very deliberately do not include just a description in our RSS feeds, however we have been misusing the description field to include the full content. Now that node-rss supports custom namespaces and fields, we should use the content module, and change the description to include just the meta description.
  • PubDate vs lastBuildDate and ETAGs
    This bug describes an issue with ETAGs on our RSS feeds, however unfortunately simply setting the pubDate doesn't solve the problem because the lastBuildDate will still get updated every time the feed is requested. We need to either convince the node-rss library to change this behaviour, or we need to change how and when we generate our RSS so that it doesn't keep changing.
  • Overuse of CDATA
    One thing that's pretty glaring when you compare our feeds to other people's is the use of CDATA to wrap everything. This is unnecessary, it's only really an aesthetic problem, but it's already raised as an issue on node-rss so perhaps we can get this improved.

ErisDS added a commit to ErisDS/Ghost that referenced this issue Apr 5, 2015
fixes TryGhost#5104, refs TryGhost#4348, TryGhost#2263

- Create a centralised event module
- Hook it up for posts, pages, tags and users
- Use it in sitemaps instead of direct method calls
- Use it for xmlrpc calls
- Check events are fired in model tests
- Update sitemap tests to work with new code
- Fix a bug where invited users were appearing in sitemaps
- Move sitemaps and xmlrpc into a directory together
@ErisDS
Copy link
Member

ErisDS commented Apr 5, 2015

@amm4108 ah I see! Still it seems like it would be the wrong thing to do, to add those links making the pagination more discoverable when ultimately we plan to remove it as soon as the API becomes properly available?

@ErisDS
Copy link
Member

ErisDS commented Apr 5, 2015

From #5065 (comment), @adrianmacneil said:

Would it also be possible to include the post image as a separate element? It seems there are a few different standards for this, possibly the media:content tag is the right choice for this (and anecdotally is supported in mailchimp which is what we need it for).

I have done a bit of research into this, looking at example fields, and it certainly appears to be the most widely supported way of including images in RSS. The reason it wasn't done this way in the original issue was because we weren't able to add custom elements to the feed + although being the most widely supported way, the media:content element is not very widely supported. It appears to be supported by both mailchimp and campaignmonitor making it very useful for designing nicer email campaigns. However, it seems that IFTTT pulls images from the content, as does feedly.

Now that we are able to add custom elements & namespaces adding this is straightforward, however I think it needs to be added in addition to prepending the image to the content:encoded element but not prepend it to the description element. This seems to create the best balance - meaning feeds look good in feedly & other readers, whilst still being customisable in rss-to-email campaigns.

@amacneil
Copy link

amacneil commented Apr 6, 2015

Thanks @ErisDS, that solution works for us. We will be using the description element with mailchimp (for now), so having the image excluded from description will be fine.

It's a shame feedly does not support the media:content element so we could keep description and content:encoded consistent.

ErisDS added a commit to ErisDS/Ghost that referenced this issue Apr 6, 2015
refs TryGhost#2263, TryGhost#4888

- Adds media:content element to Ghost Rt pSS feeds containing the post cover image if one is available
- Removes the prepending of the image to the `<description>` field
- Keeps the prepending of the image in `<content:encoded>`
ErisDS added a commit to ErisDS/Ghost that referenced this issue Apr 10, 2015
refs TryGhost#2263, TryGhost#4888

- Adds media:content element to Ghost RSS feeds containing the post cover image if one is available
- Removes the prepending of the image to the `<description>` field
- Keeps the prepending of the image in `<content:encoded>`
ErisDS added a commit to ErisDS/Ghost that referenced this issue Apr 10, 2015
refs TryGhost#5091, refs TryGhost#2263

- Move rss handling out of the frontend controller and into its own module
- Separate the code into logical blocks
- Wrap the generation code in a in-memory cache to prevent it being regenerated on every request
ErisDS added a commit to ErisDS/Ghost that referenced this issue Apr 10, 2015
refs TryGhost#5091, refs TryGhost#2263

- Move rss handling out of the frontend controller and into its own module
- Separate the code into logical blocks
- Wrap the generation code in a in-memory cache to prevent it being regenerated on every request
ErisDS added a commit to ErisDS/Ghost that referenced this issue Apr 10, 2015
refs TryGhost#5091, refs TryGhost#2263

- Move rss handling out of the frontend controller and into its own module
- Separate the code into logical blocks
- Wrap the generation code in a in-memory cache to prevent it being regenerated on every request
@ErisDS
Copy link
Member

ErisDS commented Apr 20, 2015

With the exception of sy:updatePeriod, sy:updateFrequency and some stuff related to comments which isn't currently relevant with Ghost the two feeds are now pretty much identical. I've also spent some time looking at a lot of other feeds, and looking at how Ghost feeds work with 3rd parties and I'm pretty happy that we've got a much better offering now.

With regard to sy:updatePeriod, sy:updateFrequency I think these might be worth adding in future if we do more work around coming up with smart numbers for these to use both in RSS and the sitemap, but with the lastBuildDate & ETAGs now working properly on our feeds I'm not convinced adding defaults for these would add value right now.

I think this issue can be closed?

@ErisDS ErisDS removed the help wanted [triage] Ideal issues for contributors to help with label Apr 20, 2015
@HLFH
Copy link

HLFH commented Apr 20, 2015

@ericds Yes, this issue could be closed. (Too bad that summary feeds won't exist before the hypothetical release of Ghost Apps).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet