Good job is using a lot of memory / ram #613

rgaufman · 2022-05-29T19:03:56Z

Hi there,

I have my good_job set up like this:

    config.good_job = {
      preserve_job_records: false,
      retry_on_unhandled_error: false,
      on_thread_error: ->(exception) { Airbrake.notify(exception) },
      execution_mode: :external,
      queues: '*',
      max_threads: 10,
      poll_interval: 15,
      shutdown_timeout: 25,
      enable_cron: true,
      logger: Rails.logger
    }

Here is a screenshot of it in action:

There are rarely more than 3-4 jobs running in parallel.

Compared to Sidekiq running the same jobs, it seems to take almost double the ram and looks like this in top:

Any ideas what could be responsible for this? - maybe I should disable the dashboard? - How would I troubleshoot and reduce ram usage?

The text was updated successfully, but these errors were encountered:

rgaufman · 2022-05-29T19:35:09Z

I noticed strange behaviour, when starting puma and good_job, both jump to 30% ram use. Then Puma starts to drop quite quickly and ends at around 9%. Good_job stays at around 29% (Sidekiq before was roughly the same as Puma with the same workers) - this is 10 minutes later:

(This is with the dashboard disabled and preserve_job_records: false)

Any ideas how to troubleshoot this and figure out why so much ram is being used?

bensheldon · 2022-05-30T15:46:40Z

@rgaufman thanks for sharing this! I haven't done much memory profiling of GoodJob, though I'm surprised that GoodJob would perform significantly differently than Puma. Take the following as speculation (because I haven't looked)...

GoodJob's initialization process is pretty simple:

Boots the Rails framework:

good_job/lib/good_job/cli.rb

Lines 158 to 159 in 8b4fc67

    
           require RAILS_ENVIRONMENT_RB 
        
           return unless GoodJob::CLI.log_to_stdout? && !ActiveSupport::Logger.logger_outputs_to?(GoodJob.logger, $stdout)

Initializes a bunch of Concurrent Ruby objects (ThreadExecutors, Timers):

good_job/lib/good_job/cli.rb

Lines 92 to 99 in 8b4fc67

    
           notifier = GoodJob::Notifier.new 
        
           poller = GoodJob::Poller.new(poll_interval: configuration.poll_interval) 
        
           scheduler = GoodJob::Scheduler.from_configuration(configuration, warm_cache_on_initialize: true) 
        
           notifier.recipients << [scheduler, :create_thread] 
        
           poller.recipients << [scheduler, :create_thread] 
        
           cron_manager = GoodJob::CronManager.new(configuration.cron_entries, start_on_initialize: true) if configuration.enable_cron?

Queries for existing and future jobs to cache <-- this is a potential culprit, but when I built the feature I saw it use ~20MB/10k cached items.
Anytime that an execution thread is awakened (e.g. to execute a job immediately over LISTEN/NOTIFY, a scheduled job from the cache, or every poll_interval) that thread takes place inside a Rails Executor which should be similar to how Puma works. One thing to flag here is that the poll_interval wakes up a thread and make an ActiveRecord query every N seconds.

Because I haven't done any memory profiling, I can't give you anything specific about how to dig into the problem. I think trying to get some stackprof points into the initialization would be a good place to start, and look at allocated objects. It sounds like there isn't a problem though with object retention (e.g. memory usage increases over time), just that there is more than expected in the first place.

BTW, here's memory usage for my reference application.

Web: Puma with web concurrency 1, rails max threads 5:
Worker: GoodJob with 4 threads (good job queues = -newsletters:2;newsletters:2)

sedubois · 2022-07-02T11:09:02Z

Since upgrading our app from GoodJob 2.15.1 to 3.0.0 our Heroku worker memory is spiking.

Worker memory, past 2 hours:

Worker memory, past 24 hours:

Worker memory, past 72 hours:

The web dyno memory usage hasn't changed. Other dependencies were also upgraded (Zeitwerk, rack, ...), but the suspect is GoodJob as it's the worker memory which has increased and an inspection of the latest memory spikes shows a close overlap with the execution time of the latest jobs in the GoodJob dashboard:

GoodJob 'Scheduled At' time (UTC)	Memory spike time (1min or 10min resolution) (UTC)
10:49:18	10:49-10:51
10:36:50	10:37-10:39
09:52:47	09:52-09:59:00
09:28:26, 09:29:44	09:20-09:30
08:21, 08:22, 08:25, 08:31	08:20-08:30
07:42:40, 07:43:10, 07:46:38	07:40-07:50
07:12:07, 07:19:40	07:10-07:20
06:25:34, 06:27:04	06:20-06:30
05:45:43, 05:50:42	05:40-05:50

The above table shows that jobs are infrequent. They either send some emails or make some API calls.

Here is the GoodJob initializer, for reference (in this release we also changed the syntax from GoodJob.X = Y into Rails.application.configure do ... end):

# app/config/good_job.rb

Rails.application.configure do
  config.good_job.preserve_job_records = true
  config.good_job.retry_on_unhandled_error = false
  config.good_job.on_thread_error = -> (exception) do
    Sentry.set_extras(good_job_on_thread_error: true)
    Sentry.capture_exception(exception)
  end
end

ActionMailer::MailDeliveryJob.retry_on StandardError,
                                       wait: :exponentially_longer, attempts: Float::INFINITY
ActionMailer::MailDeliveryJob.around_perform do |_job, block|
  block.call
rescue StandardError => e
  Sentry.set_extras(action_mailer_standard_error: true)
  Sentry.capture_exception(e)
  raise
end

Any idea if/how GoodJob could cause this?

bensheldon · 2022-07-02T16:56:43Z

@sedubois yikes! That's not good.

I can't think of anything off the top of my head that would have blown out memory.

Looking through the Changelog, the major changes are:

converting GoodJob to a singular engine. I don't think this would vastly effect memory, though it does mean a few more controllers might be eager-loaded into memory at boot.
Zeitwerk configuration changes. Can't imagine anything here changing significantly unless Zeitwerk itself was doing something very unexpected, and i wouldn't expect that.
Automatically deleting old jobs was a feature changed to default enabled in GoodJob 3.0. Every 10 minutes it deletes old jobs. I think this could be the culprit, but I tried to do as much work as possible in the database rather than the app.

Let me do another sweep through the code and see what I can find.

bensheldon · 2022-07-02T21:45:43Z

@sedubois I found a possible culprit in #652 which has been released in v3.0.1. Please give that a try and let me know if that fixes your problems.

sedubois · 2022-07-03T19:05:04Z

@bensheldon yesterday I downgraded GoodJob from 3.0.0 to 2.15.1 and the memory issue had disappeared. Today I upgraded to 3.0.1 and there have also not been any memory spikes since. So it seems solved, thanks! 👍

bensheldon · 2022-07-04T16:31:53Z

@sedubois fantastic! 🎉

bensheldon mentioned this issue Jul 2, 2022

Fix GoodJob.cleanup_preserved_jobs to use delete_all instead of destroy_all #652

Merged

bensheldon closed this as completed Aug 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Good job is using a lot of memory / ram #613

Good job is using a lot of memory / ram #613

rgaufman commented May 29, 2022 •

edited

Loading

rgaufman commented May 29, 2022 •

edited

Loading

bensheldon commented May 30, 2022

sedubois commented Jul 2, 2022 •

edited

Loading

bensheldon commented Jul 2, 2022

bensheldon commented Jul 2, 2022

sedubois commented Jul 3, 2022

bensheldon commented Jul 4, 2022

Good job is using a lot of memory / ram #613

Good job is using a lot of memory / ram #613

Comments

rgaufman commented May 29, 2022 • edited Loading

rgaufman commented May 29, 2022 • edited Loading

bensheldon commented May 30, 2022

sedubois commented Jul 2, 2022 • edited Loading

bensheldon commented Jul 2, 2022

bensheldon commented Jul 2, 2022

sedubois commented Jul 3, 2022

bensheldon commented Jul 4, 2022

rgaufman commented May 29, 2022 •

edited

Loading

rgaufman commented May 29, 2022 •

edited

Loading

sedubois commented Jul 2, 2022 •

edited

Loading