Skip to content

Commit

Permalink
fixed indexing of external posts (#2983)
Browse files Browse the repository at this point in the history
This should fix several issues with indexing external posts, including
#1828.

In short, I found that the issue with indexing was that the index
builder was receiving 'empty' documents. To fix that, I'm setting the
document content to be the post content as retrieved from the rss feed
or the text extracted from the external page.

I've tested with various blog sources and it seems to be working as
expected now.
  • Loading branch information
niebles authored Jan 27, 2025
1 parent 15fc779 commit b50db2e
Showing 1 changed file with 7 additions and 2 deletions.
9 changes: 7 additions & 2 deletions _plugins/external-posts.rb
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ def create_document(site, source_name, url, content)
doc.data['description'] = content[:summary]
doc.data['date'] = content[:published]
doc.data['redirect'] = url
doc.content = content[:content]
site.collections['posts'].docs << doc
end

Expand Down Expand Up @@ -90,8 +91,12 @@ def fetch_content_from_url(url)
parsed_html = Nokogiri::HTML(html)

title = parsed_html.at('head title')&.text.strip || ''
description = parsed_html.at('head meta[name="description"]')&.attr('content') || ''
body_content = parsed_html.at('body')&.inner_html || ''
description = parsed_html.at('head meta[name="description"]')&.attr('content')
description ||= parsed_html.at('head meta[name="og:description"]')&.attr('content')
description ||= parsed_html.at('head meta[property="og:description"]')&.attr('content')

body_content = parsed_html.search('p').map { |e| e.text }
body_content = body_content.join() || ''

{
title: title,
Expand Down

0 comments on commit b50db2e

Please sign in to comment.