Skip to content

Concurrent and Incremental sitemap.xml builder for ruby applications

License

Notifications You must be signed in to change notification settings

marcosgz/site_maps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SiteMaps

SiteMaps is a gem that helps you to generate sitemaps for your Rails application.

Installation

Add this line to your application's Gemfile:

gem 'site_maps'

And then execute:

bundle install

Or install it yourself as:

gem install site_maps

Usage

Create a configuration file where you will define the sitemap logic. You can use the following DSL to define the sitemap generation. Below is the minimum configuration required to generate a sitemap:

# config/sitemap.rb
SiteMaps.use(:file_system) do
  configure do |config|
    config.url = "https://example.com/sitemaps/sitemap.xml.gz" # Location of main sitemap index file
    config.directory = "/home/www/public"
  end
  process do |s|
    s.add('/', priority: 1.0, changefreq: "daily")
    s.add('/about', priority: 0.9, changefreq: "weekly")
  end
end

After creating the configuration file, you can run the following command to generate the sitemap:

SiteMaps.generate(config_file: "config/sitemap.rb")
  .enqueue_all
  .run

or you can use the CLI to generate the sitemap:

bundle exec site_maps generate --config-file config/sitemap.rb

Configuration

Configuration can be defined using the configure block or by passing the configuration options to the use method. Each adapter may have specific configuration options, but the following options are common to all adapters:

  • url - URL of the main sitemap index file. This URL must ends with .xml or .xml.gz.
  • directory - Directory where the sitemap files will be stored.

Configuration using the #configure block

SiteMaps.use(:file_system) do
  configure do |config|
    config.url = "https://example.com/sitemaps/sitemap.xml.gz"
    config.directory = "/home/www/public"
  end
  # define sitemap processes..
end

Configuration using #config method

SiteMaps.use(:file_system) do
  config.url = "https://example.com/sitemaps/sitemap.xml.gz"
  config.directory = "/home/www/public"
  # define sitemap processes..
end

Configuration passing options to the #use method

SiteMaps.use(:file_system, url: "https://example.com/sitemaps/sitemap.xml.gz", directory: "/home/www/public") do
  # define sitemap processes..
end

Refer to the adapter documentation to see the specific configuration options.

Gzip Compression

The sitemap files can be automatically compressed using the gzip algorithm. To enable the gzip compression, just pass the sitemap url with the .gz extension.

# config/sitemap.rb
SiteMaps.use(:file_system) do
  configure do |config|
    config.url = "https://example.com/sitemaps/sitemap.xml.gz" # Location of main sitemap index file
    config.directory = "/home/www/public"
  end
  process do |s|
    # Add sitemap links
  end
end

Sitemap Index

For small websites, you can use a single sitemap file to store all the links. However, for large websites with thousands of links, you should use a sitemap index file to store the sitemap links. This library will automatically generate the sitemap index file if you define multiple processes or if the amount of links exceeds the maximum limit of links or file size.

Criteria to generate the sitemap index file:

  • Multiple processes defined in the configuration file.
  • The amount of links exceeds the maximum limit of links (50,000 links).
  • The amount of news links exceeds the maximum limit of news links (1,000 links).
  • The uncompressed file size exceeds the maximum limit of file size (50MB).

Static and Dynamic Processes

Sitemap links are defined in the process block because the gem is designed to generate sitemaps for large websites in parallel. It means that each process will be executed in a separate thread, which will improve the performance of the sitemap generation.

Each process can have a unique name and a unique sitemap file location. By omitting the name and the file location, the process will use the :default value.

Bellow is an example of a configuration file with multiple processes:

# config/sitemap.rb
SiteMaps.use(:file_system) do
  configure do |config|
    config.url = "https://example.com/sitemaps/sitemap.xml" # Location of main sitemap index file
    config.directory = "/home/www/public"
  end
  # Static Processes
  process do |s|
    s.add('/', priority: 1.0, changefreq: "daily")
    s.add('/about', priority: 0.9, changefreq: "weekly")
  end
  process :categories, "categories/sitemap.xml" do |s|
    Category.find_each do |category|
      s.add(category_path(category), priority: 0.7)
    end
  end
  # Dynamic Processes
  process :posts, "posts/%{year}-%{month}/sitemap.xml", year: Date.today.year, month: Date.today.month do |s, year:, month:|
    Post.where(year: year.to_i, month: month.to_i).find_each do |post|
      s.add(post_path(post), priority: 0.8)
    end
  end
end

Dynamic process are defined by passing a process name, a location, and a list of extra arguments that will be dinamically replaced by the given values in the enqueue method.

Location can contain placeholders that will be replaced by the values passed to the process block(The %{year} and %{month} of example bellow). Both relative and absolute paths are supported. Note that when using relative paths, the base dir of main sitemap index file will be used as the root directory.

It will let you enqueue the same process multiple times with different values.

SiteMaps.generate(config_file: "config/sitemap.rb")
  .enqueue(:posts, year: "2021", month: "01")
  .enqueue(:posts, year: "2021", month: "02")
  .enqueue_remaining # Enqueue all remaining processes (default and categories)
  .run

Important Considerations:

  • The values of the extra arguments may be strings when they are coming from the CLI or other sources.
  • By omitting the extra arguments, the process will be enqueued with the default values defined in the configuration file. So make sure you define default values or properly add nil checks in the process block to avoid errors.

Sitemap Extensions

The sitemap builder supports the following sitemap extensions:

You can add the sitemap links with the extensions by passing a hash with the extension name as the key and the extension attributes as the value.

Image

Images can be added to the sitemap links by passing images attributes to the add method. The images attribute should be an array of hashes with the image attributes.

Check out the Google specification here.

config = { ... }
SiteMaps.use(:file_system, **config) do
  process do |s|
    s.add(
      '/',
      priority: 1.0,
      changefreq: "daily",
      images: [
        { loc: "https://example.com/image.jpg" }
      ],
    )
  end
end

Supported attributes:

  • loc - URL of the image.
  • caption - Image caption.
  • geo_location - Image geo location.
  • title - Image title.
  • license - Image license.

Video

Videos can be added to the sitemap links by passing videos attributes to the add method. The videos attribute should be an array of hashes with the video attributes.

Check out the Google specification here.

config = { ... }
SiteMaps.use(:file_system, **config) do
  process do |s|
    s.add(
      '/',
      priority: 1.0,
      changefreq: "daily",
      videos: [
        {
          thumbnail_loc: "https://example.com/thumbnail.jpg",
          title: "Video Title",
          description: "Video Description",
          content_loc: "https://example.com/video.mp4",
          player_loc: "https://example.com/player.swf",
          allow_embed: "yes",
          autoplay: "ap=1",
          # ...
        }
      ],
    )
  end
end

Supported attributes:

  • thumbnail_loc - URL of the thumbnail image.
  • title - Title of the video.
  • description - Description of the video.
  • content_loc - URL of the video content.
  • player_loc - URL of the video player.
  • allow_embed - Allow embed attribute of the player location.
  • autoplay - Autoplay attribute of the player location.
  • duration - Duration of the video in seconds.
  • expiration_date - Expiration date of the video.
  • rating - Rating of the video.
  • view_count - View count of the video.
  • publication_date - Publication date of the video.
  • tags - Tags of the video.
  • tag - Single tag of the video.
  • category - Category of the video.
  • family_friendly - Family friendly attribute of the video.
  • gallery_loc - URL of the video gallery.
  • gallery_title - Title of the video gallery.
  • uploader - Uploader of the video.
  • uploader_info - Uploader info of the video.
  • price - Price of the video.
  • price_currency - Currency of the video price.
  • price_type - Type of the video price.
  • price_resolution - Resolution of the video price.
  • live - Live attribute of the video.
  • requires_subscription - Requires subscription attribute of the video.

PageMap

PageMap sitemaps can be added to the sitemap links by passing pagemap attributes to the add method. The pagemap attribute should be a hash with the pagemap attributes.

Check out the Google specification here.

config = { ... }
SiteMaps.use(:file_system, **config) do
  process do |s|
    s.add(
      '/',
      priority: 1.0,
      changefreq: "daily",
      pagemap: {
        dataobjects: [
          {
            type: "document",
            id: "1",
            attributes: [
              { name: "title", value: "Page Title" },
              { name: "description", value: "Page Description" },
              { name: "url", value: "https://example.com" },
            ]
          }
        ]
      }
    )
  end
end

Supported attributes:

  • dataobjects - Array of hashes with the data objects.
    • type - Type of the object.
    • id - ID of the object.
    • attributes - Array of hashes with the attributes.
      • name - Name of the attribute.
      • value - Value of the attribute.

News

News sitemaps can be added to the sitemap links by passing news attributes to the add method. The news attribute should be a hash with the news attributes.

Check out the Google specification here.

config = { ... }
SiteMaps.use(:file_system, **config) do
  process do |s|
    s.add(
      '/',
      priority: 1.0,
      changefreq: "daily",
      news: {
        publication_name: "Publication Name",
        publication_language: "en",
        publication_date: Time.now,
        genres: "PressRelease",
        access: "Subscription",
        title: "News Title",
        keywords: "News Keywords",
        stock_tickers: "NASDAQ:GOOG",
      }
    )
  end
end

Supported attributes:

  • publication_name - Name of the publication.
  • publication_language - Language of the publication.
  • publication_date - Publication date of the news.
  • genres - Genres of the news.
  • access - Access of the news.
  • title - Title of the news.
  • keywords - Keywords of the news.
  • stock_tickers - Stock tickers of the news.

Alternates

You can add alternate links to the sitemap links by passing alternates attributes to the add method. The alternates attribute should be an array of hashes with the alternate attributes.

Check out the Google specification here.

config = { ... }
SiteMaps.use(:file_system, **config) do
  process do |s|
    s.add(
      '/',
      priority: 1.0,
      changefreq: "daily",
      alternates: [
        { href: "https://example.com/en", lang: "en" },
        { href: "https://example.com/es", lang: "es" },
      ],
    )
  end
end

Supported attributes:

  • href - URL of the alternate link. (Required)
  • lang - Language of the alternate link. (Optional)
  • nofollow - Nofollow attribute of the alternate link. (Optional)
  • media - Media targets for responsive design pages. (Optional)

Mobile

Mobile sitemaps include a specific mobile:mobile/ tag.

Check out the Google specification here.

config = { ... }
SiteMaps.use(:file_system, **config) do
  process do |s|
    s.add('/', mobile: true)
  end
end

Supported attributes:

  • mobile - Mobile attribute of the sitemap link.

Adapters

The gem provides adapters to store the sitemaps in different locations. The following adapters are available:

  • File System
  • AWS S3

File System

You can use the file system adapter to store the sitemaps in the file system. The configuration is simple, you just need to provide the directory where the sitemaps will be stored.

SiteMaps.use(:file_system) do
  configure do |config|
    config.url = "https://example.com/sitemaps/sitemap.xml.gz"
    config.directory = "/home/www/public"
  end
  process do |s|
    # Add sitemap links
  end
end

AWS S3

You can use the AWS S3 adapter to store the sitemaps in an S3 bucket. The configuration is similar to the file system adapter, but you need to provide the AWS SDK options.

SiteMaps.use(:aws_sdk) do
  configure do |config|
    config.url = "https://my-bucket.s3.amazonaws.com/sitemaps/sitemap.xml"
    config.directory = "/tmp" # Local directory to store the sitemaps before uploading to S3
    # AWS S3 specific options
    config.bucket = "my-bucket"
    config.region = "us-east-1"
    config.aws_access_key = ENV["AWS_ACCESS_KEY_ID"]
    config.aws_secret_key = ENV["AWS_SECRET_ACCESS_KEY"]
    # Optional parameters (default values)
    config.acl = "public-read"
    config.cache_control = "private, max-age=0, no-cache"
  end
  process do |s|
    # Add sitemap links
  end
end

If you want to let your rails application as a proxy to the sitemap files, you can create a controller to serve the sitemap files from the S3 bucket.

# config/routes.rb
get "sitemaps/*relative_path", to: "sitemaps#show", as: :sitemap
# app/controllers/sitemaps_controller.rb
class SitemapsController < ApplicationController
  def show
    location = params.permit("relative_path", "format").to_h.values.join(".")

    unless location =~ /\.xml(\.gz)?$/ # You may want add more validations here
      raise ActionController::RoutingError, "Not found"
    end

    data, meta = SiteMaps.current_adapter.read(File.join("sitemaps", location))
    if location.ends_with?(".xml")
      render xml: data
    else
      send_data(data, disposition: "attachment", type: meta[:content_type])
    end
  rescue SiteMaps::FileNotFoundError
    raise ActionController::RoutingError, "Not found"
  end
end

Make sure to let sitemap config in the initializer. You may want to add some caching to avoid hitting the S3 bucket on every request.

Custom Adapters

You can create custom adapters to store the sitemaps in different locations. You just need to create a class that implements the SiteMaps::Adapters::Adapter interface. The adapter should implement the following methods:

  • write(url, raw_data, **extra) - Write the sitemap data to the storage.
  • read(url) - Read the sitemap data from the storage.
  • delete(url) - Delete the sitemap data from the storage.
class MyAdapter < SiteMaps::Adapters::Adapter
  def write(url, raw_data, **extra)
    # Write the sitemap data to the storage
  end

  def read(url)
    # Read the sitemap data from the storage
  end

  def delete(url)
    # Delete the sitemap data from the storage
  end
end

SiteMaps.use(MyAdapter, **config) do
  process do |s|
    # Add sitemap links
  end
end

Adapter Configuration

If you adapter requires additional configuration, you can define a <adapter class>::Config inheriting from SiteMaps::Configuration and implement the required configuration options.

class MyAdapter::Config < SiteMaps::Configuration
  attribute :api_key, default: -> { ENV["MY_API_KEY"] }
end

During the adapter initialization, it will automatically detect the configuration class and use it to load the configuration options.

SiteMaps.use(MyAdapter) do
  configure do |config|
    # ...
    config.api_key = "my-api-key"
  end
  process do |s|
    # Add sitemap links
  end
end

CLI

You can use the CLI to generate the sitemap. The CLI will load the configuration file and run the sitemap generation.

bundle exec site_maps generate --config-file config/sitemap.rb

To enqueue dynamic processes, you can pass the process name with the context values.

bundle exec site_maps generate monthly_posts \
  --config-file config/sitemap.rb \
  --context=year:2021 month:1

Enqueue dynamic + remaining processes

bundle exec site_maps generate monthly_posts \
  --config-file config/sitemap.rb \
  --context=year:2021 month:1 \
  --enqueue-remaining

passing max threads to run the processes in parallel

bundle exec site_maps generate \
  --config-file config/sitemap.rb \
  --max-threads 10

Notification

You can subscribe to the internal events to receive notifications about the sitemap generation. The following events are available:

  • sitemaps.enqueue_process - Triggered when a process is enqueued.
  • sitemaps.before_process_execution - Triggered before a process starts execution
  • sitemaps.process_execution - Triggered when a process finishes execution.
  • sitemaps.finalize_urlset - Triggered when the sitemap builder finishes the URL set.

You can subscribe to the events using the following code:

SiteMaps::Notification.subscribe("sitemaps.enqueue_process") do |event|
  puts "Enqueueing process #{event.payload[:name]}"
end

We have the standard event handler SiteMaps::Runner::EventListener that will print the events to the standard output. You can use it to view the progress of the sitemap generation.

SiteMaps::Notification.subscribe(SiteMaps::Runner::EventListener)
SiteMaps.generate(config_file: "config/sitemap.rb")
  .enqueue_all
  .run

Mixins

You can use mixins to extend the sitemap builder with additional methods. The mixins can be used to define common methods that will be used in multiple processes.

module MyMixin
  def repository
    Repository.new
  end

  def post_path(post)
    "/posts/#{post.slug}"
  end
end

SiteMaps.use(:file_system) do
  extend_processes_with(MyMixin)
  process do |s|
    s.repository.posts.each do |post|
      s.add(s.post_path(post), priority: 0.8)
    end
  end
end

We already have a built-in mixin for Rails applications that provides the url helpers through the route method.

  process do |s|
    s.add(s.route.root_path, priority: 1.0)
    s.add(s.route.about_path, priority: 0.9)
  end
end

Development

After checking out the repo, run bin/setup to install dependencies. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/marcosgz/site_maps.

License

The gem is available as open source under the terms of the MIT License.

About

Concurrent and Incremental sitemap.xml builder for ruby applications

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published