Skip to content

Commit

Permalink
feat(auto_source): add support for auto_source feature (#676)
Browse files Browse the repository at this point in the history
  • Loading branch information
gildesmarais authored Oct 21, 2024
1 parent 8eeb9e6 commit 531dced
Show file tree
Hide file tree
Showing 22 changed files with 1,093 additions and 40 deletions.
7 changes: 6 additions & 1 deletion Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,21 @@ source 'https://rubygems.org'

git_source(:github) { |repo_name| "https://github.com/#{repo_name}" }

gem 'html2rss'
gem 'html2rss', '~> 0.14'
gem 'html2rss-configs', github: 'html2rss/html2rss-configs'

# Use these instead of the two above (uncomment them) when developing locally:
# gem 'html2rss', path: '../html2rss'
# gem 'html2rss-configs', path: '../html2rss-configs'

gem 'base64'
gem 'erubi'
gem 'parallel'
gem 'rack-cache'
gem 'rack-timeout'
gem 'rack-unreloader'
gem 'roda'
gem 'ssrf_filter'
gem 'tilt'

gem 'puma', require: false
Expand All @@ -33,7 +35,10 @@ group :development do
end

group :test do
gem 'climate_control'
gem 'rack-test'
gem 'rspec'
gem 'simplecov', require: false
gem 'vcr'
gem 'webmock'
end
21 changes: 20 additions & 1 deletion Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,14 @@ GEM
addressable (2.8.7)
public_suffix (>= 2.0.2, < 7.0)
ast (2.4.2)
base64 (0.2.0)
bigdecimal (3.1.8)
byebug (11.1.3)
climate_control (1.2.0)
concurrent-ruby (1.3.4)
crack (1.0.0)
bigdecimal
rexml
crass (1.0.6)
diff-lcs (1.5.1)
docile (1.4.1)
Expand All @@ -25,6 +31,7 @@ GEM
faraday (>= 1, < 3)
faraday-net_http (3.3.0)
net-http
hashdiff (1.1.1)
html2rss (0.14.0)
addressable (~> 2.7)
faraday (> 2.0.1, < 3.0)
Expand Down Expand Up @@ -75,6 +82,8 @@ GEM
rack (3.1.7)
rack-cache (1.17.0)
rack (>= 0.4)
rack-test (2.1.0)
rack (>= 1.3)
rack-timeout (0.7.0)
rack-unreloader (2.1.0)
rainbow (3.1.1)
Expand Down Expand Up @@ -132,13 +141,18 @@ GEM
simplecov_json_formatter (~> 0.1)
simplecov-html (0.12.3)
simplecov_json_formatter (0.1.4)
ssrf_filter (1.1.2)
thor (1.3.2)
tilt (2.4.0)
tzinfo (2.0.6)
concurrent-ruby (~> 1.0)
unicode-display_width (2.5.0)
uri (0.13.1)
vcr (6.2.0)
webmock (3.24.0)
addressable (>= 2.8.0)
crack (>= 0.3.2)
hashdiff (>= 0.4.0, < 2.0.0)
yard (0.9.36)
zeitwerk (2.6.18)

Expand All @@ -151,13 +165,16 @@ PLATFORMS
x86_64-linux

DEPENDENCIES
base64
byebug
climate_control
erubi
html2rss
html2rss (~> 0.14)
html2rss-configs!
parallel
puma
rack-cache
rack-test
rack-timeout
rack-unreloader
rake
Expand All @@ -169,8 +186,10 @@ DEPENDENCIES
rubocop-rspec
rubocop-thread_safety
simplecov
ssrf_filter
tilt
vcr
webmock
yard

BUNDLED WITH
Expand Down
66 changes: 53 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,16 @@ services:
target: /app/config/feeds.yml
read_only: true
environment:
- RACK_ENV=production
- HEALTH_CHECK_USERNAME=health
- HEALTH_CHECK_PASSWORD=please-set-YOUR-OWN-veeeeeery-l0ng-aNd-h4rd-to-gue55-Passw0rd!
RACK_ENV: production
HEALTH_CHECK_USERNAME: health
HEALTH_CHECK_PASSWORD: please-set-YOUR-OWN-veeeeeery-l0ng-aNd-h4rd-to-gue55-Passw0rd!
# AUTO_SOURCE_ENABLED: true
# AUTO_SOURCE_USERNAME: foobar
# AUTO_SOURCE_PASSWORD: A-Unique-And-Long-Password-For-Your-Own-Instance
## to allow just requests originating from the local host
# AUTO_SOURCE_ALLOWED_ORIGINS: 127.0.0.1:3000
## to allow multiple origins, seperate those via comma:
# AUTO_SOURCE_ALLOWED_ORIGINS: example.com,h2r.host.tld
watchtower:
image: containrrr/watchtower
volumes:
Expand All @@ -66,6 +73,31 @@ The [watchtower](https://containrrr.dev/watchtower/) service automatically pulls

The `docker-compose.yml` above contains a service description for watchtower.

## How to use automatic feed generation

> [!NOTE]
> This feature is disabled by default.

To enable the `auto_source` feature, comment in the env variables in the `docker-compose.yml` file from above and change the values accordingly:

```yaml
environment:
## … snip ✁
AUTO_SOURCE_ENABLED: true
AUTO_SOURCE_USERNAME: foobar
AUTO_SOURCE_PASSWORD: A-Unique-And-Long-Password-For-Your-Own-Instance
## to allow just requests originating from the local host
AUTO_SOURCE_ALLOWED_ORIGINS: 127.0.0.1:3000
## to allow multiple origins, seperate those via comma:
# AUTO_SOURCE_ALLOWED_ORIGINS: example.com,h2r.host.tld
## … snap ✃
```

Restart the container and open <http://127.0.0.1:3000/auto_source>.
When asked, enter your username and password.

Then enter the URL of a website and click on the _Generate_ button.

## How to use the included configs

html2rss-web comes with many feed configs out of the box. [See the file list of all configs.](https://github.com/html2rss/html2rss-configs/tree/master/lib/html2rss/configs)
Expand All @@ -85,7 +117,7 @@ To build your own RSS feed, you need to create a _feed config_.\
That _feed config_ goes into the file `feeds.yml`.\
Check out the [`example` feed config](https://github.com/html2rss/html2rss-web/blob/master/config/feeds.yml#L9).

Please refer to [html2rss' README for a description of _the feed config and its options_](https://github.com/html2rss/html2rss#the-feed-config-and-its-options). html2rss-web is just a small web application that depends on html2rss.
Please refer to [html2rss' README for a description of _the feed config and its options_](https://github.com/html2rss/html2rss#the-feed-config-and-its-options). html2rss-web is just a small web application that builds on html2rss.

## Versioning and releases

Expand All @@ -112,15 +144,23 @@ If you're going to host a public instance, _please, please, please_:

### Supported ENV variables

| Name | Description |
| ------------------------------ | -------------------------------- |
| `PORT` | default: 3000 |
| `RACK_ENV` | default: 'development' |
| `RACK_TIMEOUT_SERVICE_TIMEOUT` | default: 15 |
| `WEB_CONCURRENCY` | default: 2 |
| `WEB_MAX_THREADS` | default: 5 |
| `HEALTH_CHECK_USERNAME` | default: auto-generated on start |
| `HEALTH_CHECK_PASSWORD` | default: auto-generated on start |
| Name | Description |
| ------------------------------ | ---------------------------------- |
| `BASE_URL` | default: '<http://localhost:3000>' |
| `LOG_LEVEL` | default: 'warn' |
| `HEALTH_CHECK_USERNAME` | default: auto-generated on start |
| `HEALTH_CHECK_PASSWORD` | default: auto-generated on start |
| | |
| `AUTO_SOURCE_ENABLED` | default: false |
| `AUTO_SOURCE_USERNAME | no default |
| `AUTO_SOURCE_PASSWORD | no default |
| `AUTO_SOURCE_ALLOWED_ORIGINS` | no default. |
| | |
| `PORT` | default: 3000 |
| `RACK_ENV` | default: 'development' |
| `RACK_TIMEOUT_SERVICE_TIMEOUT` | default: 15 |
| `WEB_CONCURRENCY` | default: 2 |
| `WEB_MAX_THREADS` | default: 5 |

### Runtime monitoring via `GET /health_check.txt`

Expand Down
18 changes: 7 additions & 11 deletions app.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

require 'roda'
require 'rack/cache'

require_relative 'roda/roda_plugins/basic_auth'

module Html2rss
Expand All @@ -12,12 +11,9 @@ module Web
#
# It is built with [Roda](https://roda.jeremyevans.net/).
class App < Roda
# TODO: move to helper
def self.development?
ENV['RACK_ENV'] == 'development'
end
CONTENT_TYPE_RSS = 'application/xml'

def development? = self.class.development?
def self.development? = ENV['RACK_ENV'] == 'development'

opts[:check_dynamic_arity] = false
opts[:check_arity] = :warn
Expand All @@ -33,16 +29,16 @@ def development? = self.class.development?
csp.script_src :self
csp.connect_src :self
csp.img_src :self
csp.font_src :self
csp.font_src :self, 'data:'
csp.form_action :self
csp.base_uri :none
csp.frame_ancestors :none
csp.frame_ancestors :self
csp.frame_src :self
csp.block_all_mixed_content
end

plugin :default_headers,
'Content-Type' => 'text/html',
'X-Frame-Options' => 'deny',
'X-Content-Type-Options' => 'nosniff',
'X-XSS-Protection' => '1; mode=block'

Expand All @@ -53,8 +49,9 @@ def development? = self.class.development?
handle_error(error)
end

plugin :hash_branches
plugin :hash_branch_view_subdir
plugin :public
plugin :content_for
plugin :render, escape: true, layout: 'layout'
plugin :typecast_params
plugin :basic_auth
Expand All @@ -69,7 +66,6 @@ def development? = self.class.development?

route do |r|
r.public

r.hash_branches('')

r.root { view 'index' }
Expand Down
4 changes: 2 additions & 2 deletions config.ru
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ require 'rubygems'
require 'bundler/setup'
require 'rack-timeout'

use Rack::Timeout

dev = ENV.fetch('RACK_ENV', nil) == 'development'

requires = Dir['app/**/*.rb']
Expand All @@ -26,6 +24,8 @@ if dev

run Unreloader
else
use Rack::Timeout

require_relative 'app'
requires.each { |f| require_relative f }

Expand Down
62 changes: 62 additions & 0 deletions helpers/auto_source.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# frozen_string_literal: true

require 'addressable'
require 'base64'
require 'html2rss'
require 'ssrf_filter'

module Html2rss
module Web
##
# Helper methods for handling auto source feature.
class AutoSource
def self.enabled? = ENV['AUTO_SOURCE_ENABLED'].to_s == 'true'
def self.username = ENV.fetch('AUTO_SOURCE_USERNAME')
def self.password = ENV.fetch('AUTO_SOURCE_PASSWORD')

def self.allowed_origins = ENV.fetch('AUTO_SOURCE_ALLOWED_ORIGINS', '')
.split(',')
.map(&:strip)
.reject(&:empty?)
.to_set

# @param encoded_url [String] Base64 encoded URL
# @return [RSS::Rss]
def self.build_auto_source_from_encoded_url(encoded_url)
url = Addressable::URI.parse Base64.urlsafe_decode64(encoded_url)
request = SsrfFilter.get(url)
headers = request.to_hash.transform_values(&:first)

auto_source = Html2rss::AutoSource.new(url, body: request.body, headers:)

auto_source.channel.stylesheets << Html2rss::RssBuilder::Stylesheet.new(href: '/rss.xsl', type: 'text/xsl')

auto_source.build
end

# @param rss [RSS::Rss]
# @param default_in_minutes [Integer]
# @return [Integer]
def self.ttl_in_seconds(rss, default_in_minutes: 60)
(rss&.channel&.ttl || default_in_minutes) * 60
end

# @param request [Roda::RodaRequest]
# @param response [Roda::RodaResponse]
# @param allowed_origins [Set<String>]
def self.check_request_origin!(request, response, allowed_origins = AutoSource.allowed_origins)
if allowed_origins.empty?
response.write 'No allowed origins are configured. Please set AUTO_SOURCE_ALLOWED_ORIGINS.'
else
origin = Set[request.env['HTTP_HOST'], request.env['HTTP_X_FORWARDED_HOST']].delete(nil)
return if allowed_origins.intersect?(origin)

response.write 'Origin is not allowed.'
end

response.status = 403
request.halt
end
end
end
end
11 changes: 10 additions & 1 deletion helpers/error_handlers.rb → helpers/handle_error.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# frozen_string_literal: true

require 'html2rss/configs'
require_relative '../app/local_config'

module Html2rss
module Web
class App
Expand All @@ -15,15 +18,21 @@ def handle_error(error) # rubocop:disable Metrics/MethodLength
when LocalConfig::NotFound,
Html2rss::Configs::ConfigNotFound
set_error_response('Feed config not found', 404)
when Html2rss::Error
set_error_response('Html2rss error', 422)
else
set_error_response('Internal Server Error', 500)
end

@show_backtrace = ENV.fetch('RACK_ENV', nil) == 'development'
@show_backtrace = self.class.development?
@error = error

set_view_subdir nil
view 'error'
end

private

def set_error_response(page_title, status)
@page_title = page_title
response.status = status
Expand Down
2 changes: 1 addition & 1 deletion helpers/handle_html2rss_configs.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ def handle_html2rss_configs(request, _folder_name, _config_name_with_ext)
path = RequestPath.new(request)

Html2rssFacade.from_config(path.full_config_name, typecast_params) do |config|
response['Content-Type'] = 'text/xml'
response['Content-Type'] = CONTENT_TYPE_RSS
HttpCache.expires(response, config.ttl * 60, cache_control: 'public')
end
end
Expand Down
Loading

0 comments on commit 531dced

Please sign in to comment.