Cronjobs Monitoring - Your feedback needed #42283

therealarkin · 2022-12-13T04:15:58Z

therealarkin
Dec 13, 2022

Hi Folks!

Eran here, a member of Sentry’s product team. We are constantly thinking about ways to make your life easier, and one of the areas we are thinking about is Cronjobs or recurring tasks!

We would love to understand how you all use Cronjobs and what type of monitoring is currently done, if any. Some questions to get the conversation going -

What kinds of Cronjobs do you have?
What processes/tools/products are you using, if any, to monitor them?
What’s working/not working with the current ways you monitor them? And why?
How often do cronjobs fail? Are failures typically infrastructure or code related? Is fixing failed cronjobs a high priority?

Christophvh · 2022-12-13T09:53:02Z

Christophvh
Dec 13, 2022

Most CronJobs are for heavy actions like generating invoices for a bunch of new subscriptions at night. Also a lot of clean-up jobs like clearing password resets or old files every night. There are also cron-jobs to fix missing data or a fallback to sent mails: Example is a job that checks if the 'sent_at' column on a invoices table is null if that is the case make sure the mail gets sent again. Import/Exports are also very common: Import excel files from a SFTP server for example.
https://ohdear.app/ & https://thenping.me/
Monitoring for our NodeJS projects is not working in depth, the above examples are good for Laravel projects since they do cron jobs the same way so easy to write a monitor tool to catch them.
Depends on the project, mostly code related.

2 replies

therealarkin Dec 13, 2022
Author

@Christophvh, thanks for your response. Super helpful!!
When you write, "Monitoring for our NodeJS projects is not working in depth," what do you mean by depth? What is "deep" vs. "shallow" in that context? And what type of "depth" is useful?

Christophvh Dec 14, 2022

'Depth' in this case is the level of detail on the cron job and depth on why and where it fails.

This is just the nature of not working with a framework for our NodeJS projects. We also have Laravel projects and since that is a framework, Sentry or other monitoring tools can obviously set better defaults. For example: the error tracking with sentry for Laravel works out of the box without any config. While in our NodeJS projects we have to do a lot of manual setup to get good error tracking in Sentry. And this is the same with other tools in regard to cron monitoring. The above tools are pretty much made for Laravel.

In our Node projects, we use BullMQ for cron jobs, which makes cron jobs look like regular jobs.

danielputerman · 2022-12-13T10:40:57Z

danielputerman
Dec 13, 2022

We use Cronjobs mainly for production monitoring (de-facto running a test suite on production SDKs) to make sure nothing slipped under the radar.
It's CI definition (we use GA).
Since it's a simple use case, no specific issues.
As it usually means something is broken on prod, a job failure is a high priority. If it happens it is mostly code related, and once in a while, it's an update of the runtime env that broke backward compatibility - something we should also be aware of.

1 reply

therealarkin Dec 13, 2022
Author

@danielputerman really helpful; thanks! So you define the job inside GitHub Actions and use that for monitoring?

rjo100 · 2022-12-16T00:05:20Z

rjo100
Dec 16, 2022

This is all experience from my previous job:
What kinds of Cronjobs do you have?

Mostly tasks that sync with 3rd parties, billing, polling for domain/IP changes, things like that
What processes/tools/products are you using, if any, to monitor them?
Dead Man's Snitch + regular APM stuff + Datadog tags + Alerts
What’s working/not working with the current ways you monitor them? And why?
Could get noisy when something failed the wrong way because it would keep running over and over
How often do cronjobs fail? Are failures typically infrastructure or code related? Is fixing failed cronjobs a high priority?
Usually code related, but often the 3rd party API was acting up. Generally yes as this was something that was fault tolerant obviously but only up to a point. Usually just needed fixing that day but not considered an active outage (i.e. status page update)

0 replies

jmduke · 2022-12-21T14:11:32Z

jmduke
Dec 21, 2022

This is an exciting idea!

What kinds of Cronjobs do you have?

I run my crons through Heroku Scheduler + django-cron. They're all meant to be relatively short-lived (the main 'action' they take is to dump asynchronous work in to worker queues rather than to do the work themselves), and cover a wide variety of uses:

Running automated checker/invariant infrastructure
Sending lifecycle emails or managing subscriptions
Generating OLAP-ish data out-of-band for analytics
Regenerating Oauth tokens and syncing with various third-party services

What processes/tools/products are you using, if any, to monitor them?

The core 'cron runner' hooks into my workload database so I do get failed entries if something fails (and exceptions go to Sentry); beyond that, very little.

What’s working/not working with the current ways you monitor them? And why?

I'd complain about two things:

Performance with crons is more important than performance elsewhere in the application, and I don't have any great primitives to set SLAs on specific crons ("email me if this cron takes more than five minutes; page me if it takes more than ten", that kind of thing)
Right now, failures and exceptions bubble up in the same way any other application error does, and I wish I could highlight them specifically

How often do cronjobs fail? Are failures typically infrastructure or code related? Is fixing failed cronjobs a high priority?

Code, rather than infra. (Or more accurately — "exogenous changes such as a third party provider", rather than infra.) And yes.

2 replies

therealarkin Dec 27, 2022
Author

Thanks Justin!

I'm curious why "Performance with crons is more important than performance elsewhere in the application" in my mental model Cron jobs aren't user-facing, and therefore it's OK for them to run a little longer.

We are totally looking into ways of separating a Cron error from other errors! We will update you here!

jmduke Jan 4, 2023

@therealarkin I probably should have phrased that as "SLA slippage for crons is more important than SLA slippage for other arbitrary async processes"!

johnvictorfs · 2022-12-23T13:06:26Z

johnvictorfs
Dec 23, 2022

Video encoding/processing, sending scheduled and periodic emails and slack messages, conditionally changing some statuses that are supposed to change back every X days.
We currently run our jobs with Heroku Scheduler by running Rails rake tasks and our only monitoring is Sentry error tracking, so barely any. We are currently working on moving them to GitHub Actions cron jobs, and possibly replacing rake tasks with SideKiq + Redis, which we already use for more instant but still background jobs (and we have somewhat proper monitoring and logs for those).
It's usually fine to know if something has crashed, hard to notice if something has went wrong but didn't actually crash any process, since we only have exceptions error tracking (with Sentry), there's also not much logging, which hasn't been problematic yet, since we try to keep them as simple as possible, but could be a possible concern if something were to go very wrong with some of those tasks.
Very unlikely, but when it happens they are mostly code issues. Sometimes in the past we have hit rate-limits with services that don't error out, so if we didn't have checks for that, they would start failing without us noticing, but there were no major issues when that happened.

1 reply

therealarkin Dec 27, 2022
Author

Thanks John!

For #3 - do you mind trying out the new Crons feature at Sentry, we just released it on https://sentry.io/crons I wonder if it can provide you the visibility you need!

cquanu · 2022-12-27T15:26:43Z

cquanu
Dec 27, 2022

Thanks for this feature. I think the implementation could be better or easier without CLI, just a simple HEAD, GET, or a POST request to the following URL like this example will do.

16 replies

dcramer Jan 6, 2023
Maintainer

We'll make sure to get the change into prod on sentry.io tomorrow! Everyone still returning from holidays so it didn't get accidentally deployed with other things ;)

cleptric Jan 6, 2023
Collaborator

@modernben Great stuff! We're also planning to bring this into the Laravel SDK at one point.
getsentry/sentry-laravel#628

If you're interested in seeing this, PRs are welcome :)

shaedrich Jun 9, 2023

@cleptric Would be cool if sentryMonitor() would take optional arguments like setting the monitor context so would wouldn't have to do this manually. Since only the $monitorSlug is required, this can be as easy as a boolean flag because the slug is already passed.

\Sentry\configureScope(function (\Sentry\State\Scope $scope): void {
    $scope->setContext('monitor', [
        'slug' => '<monitor-slug>',
    ]);
});

It would be also nice if one wouldn't have to pass $monitorSlug at all when using name() method.

cleptric Jun 12, 2023
Collaborator

We're working on a new feature that will mark setting the monitoring slug as a context obsolete.
There is also something going on regarding "upserting" cron monitors. You can follow along at getsentry/sentry-laravel#677.

shaedrich Jun 12, 2023

Ah, thanks 👍🏻

MartinMarx · 2022-12-27T16:28:21Z

MartinMarx
Dec 27, 2022

Hey Eran,

I tried this new feature today and I have some feedback:

It looks like we cannot filter the Cron Monitors view by environment as on other pages (eg: production / staging). Is it intentional? For example, we may want to test our cron jobs on a staging environment before going to production, without polluting the whole monitoring data.
It would be great to be able to add some extra data to a check-in. For example the result of the cron job execution (maybe we could have a "detail" view for each check-in?)
Does any SDK integration is planned? For now we build our own HTTP requests to the Sentry API but a more simple integration with the sentry-javascript SDK would be nice
It looks like errors thrown during the execution of the cron job are not linked to the cron monitor. A new Monitor failure issue is created when a job fails, but the exception itself is not linked with the job. Is there any way to do this?

Anyway I'm very excited to see where this new feature is going, great job!

4 replies

therealarkin Dec 27, 2022
Author

Thanks, @MartinMarx, for the detailed feedback and kind words.

These are some great ideas right here!

@gaprl: Do you mind creating a GitHub issue for #1 when you have a chance? Let's see if we can prioritize it in the next planning period.

@MartinMarx - what will you do with the extra data? What type of responses are you expecting?

As for 3 & 4, we are thinking about ways to make the Crons feature more integrated with the core Sentry workflow, and we will update here soon!

MartinMarx Dec 28, 2022

Hey @therealarkin,

About the extra data, for example we may have a job that clean some data at night and we may want to know which data has been successfully removed. It's just an example, maybe it's not relevant for the Cron monitoring feature.

Anyway thanks for your answer!

therealarkin Dec 28, 2022
Author

Thank you @MartinMarx! Do you mind emailing us an example of a response to crons-feedback at sentry dot io?

gaprl Jan 4, 2023
Maintainer

Hey @MartinMarx, wanted to let you know multi-environment support is in our roadmap and you can keep track of its progress here: #42788

foobarna · 2023-01-03T18:00:13Z

foobarna
Jan 3, 2023

Hello,

What kinds of Cronjobs do you have?
Ingesting data files, syncing tables, process new data, read SFTPs.
What processes/tools/products are you using, if any, to monitor them?
http://cronitor.io/
What’s working/not working with the current ways you monitor them? And why?
Would like to keep the errors and monitoring in the same place.
How often do cronjobs fail? Are failures typically infrastructure or code related? Is fixing failed cronjobs a high priority?
Code and data related issues. Depends on the jobs, most of them not if it's just 1 fail and next one runs. Those which are 1 or twice per week, needs fixing and trigger a run.

Tried to setup a couple of cronjobs that we have to use Sentry Crons to make a quick evaluation and besides from what was already written have just 1 complain:
Allow (somehow) to chain multiple commands that compose 1 cron definition. For example, with Cronitor we have this command in cron.d:
0 18 * * * ec2-user cronitor exec XXXX "is_auto_enabled && cd /my/dir && run_cmd"
but trying the same with
8,23,38,53 * * * * ec2-user sentry-cli monitors run XXX -- is_auto_enabled && cd /my/dir && run_cmd
we get the run status of is_auto_enabled only, and the rest of 2 commands ar run after sentry-cli monitors. Trying the same formatted command with " as cronitor fails in sentry-cli with
error could not invoke program 'is_auto_enabled && cd /my/dir && run_cmd': No such file or directory (os error 2).

It can be tested with sleep commands: sentry-cli monitors run XXXX -- "sleep 1 && echo after1 && sleep 10".

As others pointed out, would be nice to send some accompanied data related to that cron run. Either from the code with the Sentry SDK or in the format of sentry-cli monitors run XXXX -- "sleep 1 && echo after1 && sleep 10" && sentry-cli monitors <send> XXXX "``tail -n 1 some/log/file.log``".

6 replies

foobarna Jan 4, 2023

Yes, I was made aware of that - but in my available time, I did not found a way for chaining commands in the style of && for sentry-cli monitors run. I made the comparison with cronitor as they accept the positional arg of command to run as a string.

dcramer Jan 5, 2023
Maintainer

You might be able to use e.g. bash directly to run a series of commands like you're trying above (I believe bash -c "echo 1 && echo 2" should suffice)

foobarna Jan 5, 2023

@dcramer thanks for the suggestion, will try a bit later and come back.

dcramer Jan 5, 2023
Maintainer

Aside from a triage POV, if theres a way we can improve sentry-cli here I'm in favor. Asking folks to figure out cryptic bash is always a nightmare. It tooks me ages to even google the correct syntax, and I gave up and just mashed in my terminal to make sure it worked.

foobarna Jan 5, 2023

Ideally, sentry-cli should handle at least 1 of the scenarios bellow. Agree that trying to figure out on how to modify the existing cron commands is not a pleasure.

swanson · 2023-01-05T19:42:42Z

swanson
Jan 5, 2023

Run into this issue while trying out the beta:

Requests to https://sentry.io/api/0/monitors/#{monitor_id}/checkins/#{id} fail with 404 errors -- you need a trailing / at the end of the URL for it to work.

404s: PUT https://sentry.io/api/0/monitors/123/checkins/456
Works: PUT https://sentry.io/api/0/monitors/123/checkins/456/

Same behavior for the POST endpoint to start a check.

2 replies

dcramer Jan 5, 2023
Maintainer

Thats expected albeit confusing. Were our docs wrong at all?

We might be able to improve this in the API as a hole, but today they require correct URIs.

swanson Jan 5, 2023

The docs were correct. Since this isn't integrated into the sentry libraries yet, people will need to make manual HTTP calls so just wanted to raise the issue. I spent 15 minutes trying to figure out why I was getting a 404 :)

swanson · 2023-01-05T19:46:27Z

swanson
Jan 5, 2023

It would be really nice to be able to specify an optional "Label" on a check-in. This could be used to do things like adding a date, a server identifier, etc

5 replies

dcramer Jan 5, 2023
Maintainer

What behavior would you expect with the label? Always be unique? Basically are you trying to sum(checkins) for a given monitor? its not really a case we defined early on, but otherwise a label might be fine.

swanson Jan 6, 2023

I'm looking for just a little bit of traceability about where the check-in originated and a way to visually find a check-in from the list (currently just a big row of undifferentiated green dots). It would just be an extra piece of text, I wouldn't expect to use it for filtering or anything like that.

swanson Jan 6, 2023

Imagine I see 20 green dots and one red dot for a daily monitor. Can I quickly identify what the context of the failing one was?

dcramer Jan 6, 2023
Maintainer

That makes sense - ideally there is also an issue created that has some helpful added context but without deeper SDK integration you likely wouldn't have much there. Its possible we could expose osme more details on the check-in page that tie into issues. Concern right now is I dont believe we have this stuff indexed in a way that makes it cheap.

Also open question on if we should bind contexts to check-ins similar to other events. This would expose tags, env details, etc which would be helpful. e.g. sentry-cli could easily attach a bunch of this.

shaedrich Jan 6, 2023

That would be nice since having all the tags, env details and the like is probably one of the biggest dealbreakers for using sentry in the first place.

swanson · 2023-01-05T19:48:45Z

swanson
Jan 5, 2023

Editing an existing monitor appears to be broken in the Web UI at the moment. Changing settings and then clicking Save will disable the button, but the form doesnt submit and there are javascript console errors and changes are not applied.

1 reply

shaedrich Jan 6, 2023

I had that, too. However, reloading and trying again seems to have solved that for now.

swanson · 2023-01-05T19:59:30Z

swanson
Jan 5, 2023

Is the Status field in the top right the status of the most recently finished check-in?

I completed the first check-in after the second one had been missed. I would expect the status to still be missed since the most recently created check-in is missed.

1 reply

evanpurkhiser Feb 10, 2023
Collaborator

I've got this captured in an issue here! #44393

swanson · 2023-01-05T20:14:27Z

swanson
Jan 5, 2023

It might be nice to have a way to create and complete a check-in in one API call. My use-case: we have a nightly job that enqueues a bunch of other jobs. I don't need to monitor the duration, it is more of a "ping" operation. Right now I would have to do something like:

desc "Tasks that should run ~nightly"
task nightly_scheduler: :environment do
  check_id = Sentry.start_check("some-monitor-id")
  
  Hubspot::SyncMergedObjectsJob.enqueue_all!
  Segment::UpdateOrgGroupJob.perform_later
  Postmark::MaintenanceJob.perform_later
  Plans::TallyDaysAsCurrentJob.enqueue_all!

  Sentry.end_check("some-monitor-id", check_id)
end

Ideally I could just make one API call to "check in" that everything is fine.

6 replies

swanson Jan 5, 2023

Just to clarify, I don't need to upsert a monitor but rather open-and-complete a check-in on an existing monitor in one call.

Upserting a monitor does seem useful to avoid having to configure and store off the id but seems like a separate improvement :)

dcramer Jan 5, 2023
Maintainer

Ah sorry, does #42816 or the discussion in it help?

swanson Jan 5, 2023

Maybe this will be handled by client libraries (the ruby library could have a block that wraps code and completes the check when the block is done executing)?

desc "Tasks that should run ~nightly"
task nightly_scheduler: :environment do
  Sentry.with_monitoring("some-monitor-id") do
    Hubspot::SyncMergedObjectsJob.enqueue_all!
    Segment::UpdateOrgGroupJob.perform_later
    Postmark::MaintenanceJob.perform_later
    Plans::TallyDaysAsCurrentJob.enqueue_all!
  end
end

swanson Jan 5, 2023

Oh, I think I just missed the fact that I can pass status: "ok" in the POST call. I think that will be fine for creating a ping.

dcramer Jan 5, 2023
Maintainer

Trying to get docs in so at least itll be more exposed what you can/cant do with the endpoints: #42850

swanson · 2023-01-05T20:50:51Z

swanson
Jan 5, 2023

One weak spot in our Sentry setup is if our background worker queues get overloaded. I would like to periodically send a "tracer bullet" job through the queue as a way to measure how long a job is waiting in the queue before running. For example, if jobs are taking more than 10 minutes to be processed, that is a problem and we need to be alerted.

Could you advise on if you see this as an appropriate use-case for Crons? Or should I try to implement this via Sentry Performance?

I could imagine creating a monitor with max runtime: 10 minutes and schedule type: every 3 hours.

2 replies

dcramer Jan 5, 2023
Maintainer

Thought I left a response but I think it got lost-

You should be able to continuously submit a status: "in_progress" check-in. The weak point is going to be alerting unfortunately. Right now you're coupled to Issue Alert Rules, and its tightly coupled to "Errors". So whenever it "Alerts" it just generates a new error thats fingerprinted to the same monitor. This generally means the default alert rules are not ideal.

That said, we have the max runtime parameter you can configure today, and it will automatically alert after that has passed. "Alert" in this case just means creating an event/issue. Does that solve the case you need?

swanson Jan 5, 2023

Yeah, I think that will work! I added Alerts with a filter for monitor.id being present and that should be okay for now.

shaedrich · 2023-01-06T12:36:01Z

shaedrich
Jan 6, 2023

Is there a way to navigate to the monitor from an issue created by one? Because if so, I didn't manage to find it. And if not, it'd be nice to have that in the future.

2 replies

gaprl Jan 17, 2023
Maintainer

Hey @shaedrich, we want to improve the details page for a monitor issue, and I agree with you this would be nice to have -- thank you for your feedback!

shaedrich Jan 18, 2023

Hey @gaprl, sounds good, looking forward to it!

ydastous · 2023-10-24T16:00:41Z

ydastous
Oct 24, 2023

Hello i just install sentry on premise for the first time - self-hosted-23.10.0 and my logs are full of:

docker/sentry-self-hosted_cron_1[861]: 15:59:39 [WARNING] sentry_sdk.errors: Intervals shorter than one minute are not supported by Sentry Crons. Monitor 'sync-options-control' has an interval of 10 seconds. Use the exclude_beat_tasks option in the celery integration to exclude it.

is there a doc to help me fix this ?

2 replies

gaprl Oct 26, 2023
Maintainer

Thanks for reporting this @ydastous, our team is investigating and we'll get back to you soon.

gaprl Dec 2, 2023
Maintainer

Hey @ydastous are you still having trouble with this warning? It should be fixed in 23.11. If yes, please feel free to reach out to us at crons-feedback@sentry.io. Thank you.

JanMikes · 2023-11-01T08:55:25Z

JanMikes
Nov 1, 2023

Hi, we are hitting rate limits of the monitors. I have checked code - though i am not python developer, for me it seems the limits are currently hardcoded and there is no way around it:

sentry/src/sentry/monitors/consumers/monitor_consumer.py

Line 155 in eb27bc5

is_blocked = ratelimits.is_limited(

What we see in logs:

[INFO] sentry.monitors.consumers.monitor_consumer: monitors.consumer.rate_limited (organization_id=1 slug='api-log-process' environment='production')

Our usecase - we have multiple production servers and on each of them we run this specific cron every minute. Usually these crons are very fast (<1s).

Unfortunately this causes a lot of failures which are false positives - the rate limit can be hit very quickly and afterwards it ignores calls (checkins on start - causing "missed", checkins in the end - causing "timeout").

Check out this example - there are processes that already finished, but due to rate limit it refused the end check-in and sentry considers them as still in progress and after some time they will just timeout.

3 replies

MadsMoenster Nov 22, 2023

Hi @JanMikes , we see some of the same problems, we also have cron jobs with 1 minute interval and run time 1-2 sec. The rate limit you refer to is that the spike protection? I suspect that that our check-ins ends up in the spike protection when this is activated!

mdefeche Nov 22, 2023

We have the same issue with fast running crons (eg. there is no files to process in a cron import job). They ofen appear as missed or timeout. They only run daily or monthly, so in our case it's not related to the rate limits

evanpurkhiser Dec 1, 2023
Collaborator

This is not spike protection, that's specific to errors.

It sounds like what you want here is to use environments, where each of your production machines specifies its environment.

The crons monitoring product is designed to monitor single runs of a scheduled job, it's not designed to coalesce N number of runs for the same expected schedule. When you have a monitor that is expected to check-in once per minute, we do not expect it to check-in more than that.

The rate-limit is designed to protect against abuse since a monitor that is configured to check-in once per minute should not be expected to check-in more than that.

ben-z · 2023-11-04T18:00:57Z

ben-z
Nov 4, 2023

It would be nice to have a read-only API so that we can show the up/down statuses of checks on an external platform!

3 replies

gaprl Dec 4, 2023
Maintainer

Hey @ben-z you can use our API to accomplish this, check out the documentation. You can use the "status" field for your monitor.

ben-z Dec 4, 2023

I'd like a read-only token that I can include in a static page hosted publicly. From the documentation you linked and Permissions & Scopes, it's unclear to me whether I can generate an API token that's restricted to reading a single cron's status. Is this use case supported?

gaprl Dec 4, 2023
Maintainer

@ben-z unfortunately that is not currently supported, we're only able to scope our tokens by project only.

LuckySpb · 2023-11-08T14:19:22Z

LuckySpb
Nov 8, 2023

It would be great to have an option to remove certain environments from monitoring slug.
In situation when there were several tests performed from dev machine with separate environment tag, this environment remains in the monitor when it is no longer needed

3 replies

davidenwang Nov 9, 2023

Hi there, we actually have this option and are looking for ways to make it more discoverable. If you hover the environment name you should be presented an option to delete the environment inside of the listing page.

LuckySpb Nov 9, 2023

Thanks, I was originally trying to delete environment on the monitor detailed page, not on the overview page with multiple monitors.

therealarkin Dec 5, 2023
Author

@vuluongj20 this is a good feedback. We should prob make this discoverable in the details page

MarcHagen · 2023-11-13T16:52:36Z

MarcHagen
Nov 13, 2023

We have multiple platforms with the same code. So different environments.
But each environment has its own timezone. Mailing clients and stuff.
Of course, we can change the reporting time to be in UTC, but it would be nice to have timezones per environment.

3 replies

therealarkin Dec 5, 2023
Author

@MarcHagen do you mind filing a new Feature Request: https://github.com/getsentry/crons-feedback/issues/new ? Thanks!

MarcHagen Dec 17, 2023

@therealarkin I think you should do it as that repo is, just a guess, internal/private...

therealarkin Dec 17, 2023
Author

You are right! Sorry

MadsMoenster · 2023-11-22T08:50:26Z

MadsMoenster
Nov 22, 2023

We have cron jobs running every minute, and we see some strange missed check-ins and timeouts, comparing with our logs. The jobs execution time is short 1-2 sec. We have spike protection activated. I was wondering if the ckeck-ins could be rejected by the spike protection, since the ckeck-ins are errors?

2 replies

JanMikes Nov 22, 2023

Hi, might be related to this: #42283 (comment)

gaprl Dec 4, 2023
Maintainer

Hey @MadsMoenster, are you still experiencing this? If yes, can you let me know which SDK version you are currently using? Feel free to reach out to us a crons-feedback@sentry.io, we'd be glad to help you.

moshez · 2023-11-22T23:15:48Z

moshez
Nov 22, 2023

The public beta seemed to have started almost a year ago. Are there plans to mark it as production ready sometime soon? I'm not comfortable depending on a beta feature for monitoring things.

1 reply

gaprl Dec 2, 2023
Maintainer

Hey! We are committed to Cron monitoring and we will GA soon; The feature marked as Beta as we are working on some improvements, especially in stability before making it generally available to the public.

gumho · 2023-12-01T23:29:27Z

gumho
Dec 1, 2023

After using Cron Monitoring for the past couple months, the thing I like about it is the simplicity in set up and how well it works when it works. On the flip side, we receive a lot of "no-checkins" and timeouts even though the cron jobs ran just fine. It would be nice to see some attention on this issue as I've seen it mentioned in a few places already including this discussion thread.

1 reply

gaprl Dec 2, 2023
Maintainer

Hey! Thank you for the kind words and we are sorry about this experience, the feature is in beta and our top priority is to improve its stability. We make great strides but still have some work to do.

Did you have issues in the past 7 days? If so do you mind emailing crons-feedback@sentry.io. Thank you.

mstrujic-cls · 2023-12-07T14:29:50Z

mstrujic-cls
Dec 7, 2023

We've discovered this CRON monitoring and it is very convenient way to track jobs execution (especially to catch missed executions). I was trying to use sentry-cli but it wasn't handy as we are using Sentry self-hosted installation. I needed to spend some time to create some bash to use simple curl and post statuses. Also, sometimes it cannot be just simply wrapped as many scripts are coded not to throw errors, etc.

I've made it so it creates monitor automatically if not there, also it is handy to use different environments to avoid mixing everything together.

It would be cool, if there is a simple explanation how to enrich errors automatically created by Sentry by passing some string or to upload a short log or something like that. I know there must be some way through envelope and events but I need to spend some time deciphering how to link it to cron. I see in the UI there is a column attachment but no documentation how to use it. What if you add an additional body property that can contain simple string or event or something like that?

Also, if you add spans, one day we would be able to have a script execution checkpoints and measure duration of each step. Something similar what already exists in Performance. In the end you have phases like blue "request", yellow "DB something", orange "something filesystem", etc.

Any help/suggestion would be appreciated. Thank you for adding this cron monitoring feature to Sentry.

CC: @rniv-cls

P.S. Sharing my helper bash if someone would see any value (sentry_helper.sh):

# In order to make it work, three parameters need to be passed:
#   - SENTRY_DSN (will be related to a project in sentry)
#   - SENTRY_NAME (CRON monitor slug)
#   - SENTRY_SCHEDULE
#   - SENTRY_ENV (defaults to production)
#   - SENTRY_TZ (defaults to America/Chicago)
#   - SENTRY_BASEURL (defaults to sentry.io)
# If a monitor_slug is not registered to Sentry, it will add it automatically. Project must exist.

if [ -z "$SENTRY_DSN" ] | [ -z "$SENTRY_NAME" ] | [ -z "$SENTRY_SCHEDULE" ]; then
  echo "You need to set SENTRY_DSN, SENTRY_NAME (monitor slug) and SENTRY_SCHEDULE environment variables before start. Exiting...";
  exit 1;
fi

# Setting a default environment
SENTRY_ENV=${SENTRY_ENV:-production}

# Setting a default timezone
SENTRY_TZ=${SENTRY_TZ:-"America/Chicago"}

# Setting a default base url
SENTRY_BASEURL=${SENTRY_BASEURL:-https://sentry.io}

# Register functions
function sentry_report_start(){
  curl -s \
    -X POST "$SENTRY_BASEURL/api/0/organizations/sentry/monitors/$SENTRY_NAME/checkins/" \
    -H "Content-Type: application/json" \
    -H "Authorization: DSN $SENTRY_DSN" \
    -d "{\"monitor_config\":{\"schedule\":{\"type\":\"crontab\",\"value\":\"$SENTRY_SCHEDULE\"},\"timezone\":\"$SENTRY_TZ\"},\"status\":\"in_progress\",\"environment\":\"$SENTRY_ENV\"}"
}

function sentry_report_error(){
  curl -s \
    -X PUT "$SENTRY_BASEURL/api/0/organizations/sentry/monitors/$SENTRY_NAME/checkins/$checkin_id/" \
    -H "Content-Type: application/json" \
    -H "Authorization: DSN $SENTRY_DSN" \
    -d "{\"status\":\"error\",\"environment\":\"$SENTRY_ENV\"}" > /dev/null
}

function sentry_report_success(){
  curl -s \
    -X PUT "$SENTRY_BASEURL/api/0/organizations/sentry/monitors/$SENTRY_NAME/checkins/$checkin_id/" \
    -H "Content-Type: application/json" \
    -H "Authorization: DSN $SENTRY_DSN" \
    -d "{\"status\":\"ok\",\"environment\":\"$SENTRY_ENV\"}" > /dev/null
}

function sentry_report_ping(){
  curl -s \
    -X PUT "$SENTRY_BASEURL/api/0/organizations/sentry/monitors/$SENTRY_NAME/checkins/$checkin_id/" \
    -H "Content-Type: application/json" \
    -H "Authorization: DSN $SENTRY_DSN" \
    -d "{\"environment\":\"$SENTRY_ENV\"}" > /dev/null
}

# Send a CRON start message
checkin_id=$(sentry_report_start | grep -Eo '[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}')

To use it, you need to set variables and then call it:

SENTRY_DSN=...
SENTRY_ENV=...
...
# This is needed to handle relative path
source "$(dirname "$0")/sentry_helper.sh"
# This will report the start and then you just need to report success or error

#... do your code

# if success
sentry_report_success

# if error 
sentry_report_error

2 replies

therealarkin Dec 9, 2023
Author

THank you!

For the script cc: @chadwhitacre it might help other OSS folks.

For the second part: "it would be cool, if there is a simple explanation of how to enrich errors"... Is this for Crons? or in general for Sentry Errors?

mstrujic-cls Jan 11, 2024

For crons. It would be nice to be able to have an easy attach log or file or something as if something goes bad, we need to search for it and other parts of Sentry have that info feature. I guess it is there, we just need to discover what additional POST request is needed or maybe even which additional property can have error text, etc. For example, when execution misses a scheduled time, we see an issue automatically created. In case of error we should be able to enrich that issue with additional data, error context, message, etc.

Braunson · 2023-12-08T19:18:07Z

Braunson
Dec 8, 2023

Being able to specify what (in Laravel) environments the monitor will run on/check against is crucial. I have many E2E's, review apps, staging, production environments. I only want and need to run the monitor actively on staging/production however this doesn't seem like it's possible. It would be a great feature to have, until then I have to disable ->sentryMonitor() as it's killing the quota running on non-necessary environments (note: I do want Sentry to run on those environments, just not sentryMonitor)

2 replies

Braunson Dec 8, 2023

I've put a PR in for this feature getsentry/sentry-laravel#816

therealarkin Dec 9, 2023
Author

Your quota isn't being impacted right now, since Cron is in beta; Thank you for sending the PR out! cc: @cleptric

Calvin-Davidson · 2023-12-18T18:17:24Z

Calvin-Davidson
Dec 18, 2023

Cronjobs Configuration:

Our cronjobs are diversified with various schedules like DailyAtTen, DailyAtNoon, etc. Each cron job triggers processors responsible for executing specific tasks determined by user settings. For instance, the DailyTasks CronJob manages a list of processes, calling each one sequentially. The initiated processor incorporates a method for retrieving the users it needs to handle. Once the processor completes the processing for all users, it proceeds to the next processor in line. The cron job is considered done only after all processors have completed their respective tasks.

Monitoring Tools:

We use a solution that records the cron start date, end date, and errors in the database. This allows us to retrieve the data for analysis, such as calculating the average time for a process and a cron task.

Monitoring Challenges:

While our current setup effectively executes cron jobs and processes, we acknowledge the challenges in monitoring. Specifically, there is difficulty in tracking processed users, understanding the status of tasks within the cron job, and distinguishing between manual and automatic triggers.

To address these challenges, we are considering working on enhancements. One key improvement we are considering is the ability to cancel running processes. This enhancement aims to provide better control and visibility, allowing us to halt processes if needed and obtain more accurate insights into the progression of tasks.

Cronjob Failures and Warnings:

In addition to managing failures, our cron jobs have been enhanced to capture warnings. While the cron jobs themselves typically run successfully, our system is designed to detect scenarios where the process completes, yet certain users are skipped based on specific rules or settings.

These warnings offer valuable insights into instances where the process might deviate from the expected execution. User-specific configurations or rules may lead to the exclusion of certain users during task execution. By capturing these warnings, we gain visibility into potential deviations from the intended processing flow.

Our monitoring system is structured to effectively differentiate between failures and warnings. This capability allows us to proactively identify and address situations where the process completes but includes user-specific skips. This additional layer of information contributes to a more comprehensive understanding of cron job execution outcomes.

Priority of Fixing Failed Cronjobs:

Fixing failed cronjobs is not a high priority since cron jobs don't fail. However, addressing task failures for specific users is important, but the current setup lacks visibility into which process failed for each user.

Note: The current monitoring system is rather bare bones, and the existing dashboard, while functional, requires custom SQL queries for a detailed analysis of processor and cron results.

Integration Inquiry:

We are considering moving our cron monitoring to Sentry as a third-party solution to streamline management and enhance efficiency. In our current workflow, a single cron job manages multiple processes, and we are particularly interested in understanding if Sentry's tool supports a workflow where each individual process within the cron job schedule can be viewed separately. Our goal is to avoid creating a new Cronjob-Monitor for every processor, especially since processors may be reused across multiple cron jobs.

0 replies

therealarkin · 2023-12-21T19:38:20Z

therealarkin
Dec 21, 2023
Author

Hello,
TL/DR: If you instumented Cron Monitoring using Ruby, PHP or Go please read this message.

Thank you all for all the reports on some instability with Crons recently. We found a bug in the Ruby, PHP and Go SDK concerning Crons. if you set a sample_rate , this was falsely also applied to check-ins, meaning some check-ins were sampled out and never sent to Sentry.
This was fixed in Ruby 5.15.2 and PHP 4.2.0. Getting the Go SDK update out soon, too!
We checked all other SDKs as well and didn’t see the same issue there.

Thank you again for being part of this Beta!

1 reply

cleptric Jan 10, 2024
Collaborator

Go 0.26.0 was released, including a fix for the above-mentioned issue as well.

foobarna · 2024-03-05T13:22:24Z

foobarna
Mar 5, 2024

Since last night, we've started getting error: resource not found with sentry-cli running cronjobs with a monitor. No changes made, worked fine for months.

[ec2-user@ec2 ~]$ sentry-cli monitors list
+--------------------------------------+--------------------------------------+---------------------------+--------+
| ID                                   | Slug                                 | Name                      | Status |
+--------------------------------------+--------------------------------------+---------------------------+--------+
| 7f5a<redacted                  >4656 | 7f5a<redacted                  >4656 | Monitor name 1            | active |
| 61c<redacted                  >d0f17 | 61<redacted                     >f17 | Monitor name 2            | active |
| fd4<redacted                    >013 | fd49<redacted                    >13 | Monitor name 3            | active |
+--------------------------------------+--------------------------------------+---------------------------+--------+


[ec2-user@ec2 ~]$ sentry-cli monitors run --log-level debug fd4<  redacted  >c013 -- echo 2222

  INFO    2024-03-05 13:18:08.554491849 +00:00 Loaded config from /home/ec2-user/.sentryclirc
  DEBUG   2024-03-05 13:18:08.554538971 +00:00 sentry-cli version: 2.29.1, platform: "linux", architecture: "x86_64"
  INFO    2024-03-05 13:18:08.554563927 +00:00 sentry-cli was invoked with the following command line: "sentry-cli" "monitors" "run" "--log-level" "debug" "fd4<redacted>c013" "--" "echo" "2222"
  WARN    2024-03-05 13:18:08.554588919 +00:00 Token auth is deprecated for cron monitor checkins and will be removed in the next major version.
  WARN    2024-03-05 13:18:08.554598329 +00:00 Please use DSN auth.
  DEBUG   2024-03-05 13:18:08.555010795 +00:00 request POST https://sentry.io/api/0/monitors/fd4<redacted>c013/checkins/
  DEBUG   2024-03-05 13:18:08.555038035 +00:00 using token authentication
  DEBUG   2024-03-05 13:18:08.555062563 +00:00 json body: {"status":"in_progress","environment":"production"}
  DEBUG   2024-03-05 13:18:08.555085068 +00:00 retry number 0, max retries: 0
  DEBUG   2024-03-05 13:18:08.573807639 +00:00 > POST /api/0/monitors/fd<redacted>c013/checkins/ HTTP/1.1
  DEBUG   2024-03-05 13:18:08.573828625 +00:00 > Host: sentry.io
  DEBUG   2024-03-05 13:18:08.573842850 +00:00 > Accept: */*
  DEBUG   2024-03-05 13:18:08.573856098 +00:00 > Connection: TE
  DEBUG   2024-03-05 13:18:08.573869548 +00:00 > TE: gzip
  DEBUG   2024-03-05 13:18:08.573880870 +00:00 > User-Agent: sentry-cli/2.29.1
  DEBUG   2024-03-05 13:18:08.574679036 +00:00 > Authorization: Bearer c*******
  DEBUG   2024-03-05 13:18:08.574695194 +00:00 > Content-Type: application/json
  DEBUG   2024-03-05 13:18:08.574705663 +00:00 > Content-Length: 51
  DEBUG   2024-03-05 13:18:08.723766352 +00:00 < HTTP/1.1 404 Not Found

5 replies

gaprl Mar 5, 2024
Maintainer

Hey there @foobarna, you are likely using our legacy API endpoints, please check out our legacy endpoint migration doc on how to switch to our new endpoints.

foobarna Mar 5, 2024

Hey @gaprl, thanks for the follow up. I am not using any HTTP endpoints - I am using the sentry-cli tool, which I hope is the latest version (2.29.1, as seen in the output) because I've done a sentry-cli update trying to remedy this.

gaprl Mar 5, 2024
Maintainer

Hey @foobarna, our CLI will use our legacy endpoints if you are authorizing via auth tokens. Please update to use your DSN instead. It's our bad for not documenting this in the legacy migration guide -- we'll make sure to add that in. Thanks for your understanding.

thelfensdrfer Apr 9, 2024

@gaprl so this guide here is outdated? https://docs.sentry.io/product/cli/configuration/#configuration-file

For most functionality you need to authenticate with Sentry. Setting this up can be done either automatically, using sentry-cli, or manually via Organization Auth Tokens.

gaprl Apr 9, 2024
Maintainer

Good catch @thelfensdrfer, we'll add a note in there as it does not apply for Crons. Thanks for pointing this out!

serogers · 2024-03-11T14:10:59Z

serogers
Mar 11, 2024

We're seeing issues with the change to Day Light Savings Time.

Example: Cron scheduled in NY Timezone, we see logging of jobs and their switch to DST, but are now getting alerts about missed jobs. Looks like the monitors do not recognize the switch to DST. Please advise.

cron: "0 1 * * * America/New_York"
Job at 2024-03-10T06:00:01Z # UTC
Job at 2024-03-11T05:00:01Z # UTC

7 replies

serogers Mar 11, 2024

Thanks @gaprl. We're seeing the correct timezone applied on our servers, as well as the code executing the jobs.

$ env | grep TZ
TZ=America/New_York

$ date
Mon Mar 11 13:01:10 EDT 2024

In my original example above, the job triggered on time near 1am ET / 5am UTC, but the monitor flagged it as missing one hour earlier, at 12am ET / 4am UTC.

That was near the 2am time change ET, but we see it on other jobs too:

# Every weekday at 10:15am, 1:15pm, 3:15pm, 4:15pm
cron: "15 10,13,15,16 * * 1-5 America/New_York"

We'll keep poking around to see if we can find the source of the mismatch, but if there's anything else we can do on our end let me know!

evanpurkhiser Mar 11, 2024
Collaborator

@serogers just to confirm, your monitor is configured on sentry to use the America/New_York timezone here right

gaprl Mar 13, 2024
Maintainer

Hey @serogers, we were able to find an issue with the DST change, but it should happen only once -- we'll make sure to fix it for next year. Can you confirm your monitors are working as expected now? Otherwise, can you share your monitor URL for us to take a look? Thanks.

serogers Mar 13, 2024

Hi @gaprl, yes we're seeing all our monitors working as expected now. We also checked the TZ setting and all were set to NY, with one being set to Eastern. All is looking good now though, thanks for following up 🙏

evanpurkhiser Mar 13, 2024
Collaborator

Turns out there is a bug in the library we're using to compute cron schedules ☠️

Here's another report #66763

will split this into a ticket we can track to get this fixed in upstream

klemmchr · 2024-04-29T17:36:59Z

klemmchr
Apr 29, 2024

I'm a bit confused about the pricing model of cronjobs. I'm unable to find any pricing for cron monitors but I can increase the budget for them. So how much do they cost each and why is there only a single one included in each plan by default?

2 replies

gaprl Apr 29, 2024
Maintainer

Hey @klemmchr you can find pricing for Crons in our docs and in our pricing page. Here's a quick link: https://docs.sentry.io/product/accounts/pricing/#cron-monitors-pricing

klemmchr Apr 29, 2024

Hey @klemmchr you can find pricing for Crons in our docs and in our pricing page. Here's a quick link: https://docs.sentry.io/product/accounts/pricing/#cron-monitors-pricing

Thanks. Would be great if this information could be found on the pricing page on sentry.io itself.

Cronjobs Monitoring - Your feedback needed #42283

Replies: 56 comments · 147 replies

therealarkin Dec 13, 2022 Author

therealarkin Dec 13, 2022 Author

therealarkin Dec 27, 2022 Author

therealarkin Dec 27, 2022 Author

dcramer Jan 6, 2023 Maintainer

cleptric Jan 6, 2023 Collaborator

cleptric Jun 12, 2023 Collaborator

therealarkin Dec 27, 2022 Author

therealarkin Dec 28, 2022 Author

gaprl Jan 4, 2023 Maintainer

dcramer Jan 5, 2023 Maintainer

dcramer Jan 5, 2023 Maintainer

dcramer Jan 5, 2023 Maintainer

dcramer Jan 5, 2023 Maintainer

dcramer Jan 6, 2023 Maintainer

evanpurkhiser Feb 10, 2023 Collaborator

dcramer Jan 5, 2023 Maintainer

dcramer Jan 5, 2023 Maintainer

dcramer Jan 5, 2023 Maintainer

gaprl Jan 17, 2023 Maintainer

gaprl Oct 26, 2023 Maintainer

gaprl Dec 2, 2023 Maintainer

Replies: 56 comments 147 replies

therealarkin Dec 13, 2022
Author

therealarkin Dec 13, 2022
Author

therealarkin Dec 27, 2022
Author

therealarkin Dec 27, 2022
Author

dcramer Jan 6, 2023
Maintainer

cleptric Jan 6, 2023
Collaborator

cleptric Jun 12, 2023
Collaborator

therealarkin Dec 27, 2022
Author

therealarkin Dec 28, 2022
Author

gaprl Jan 4, 2023
Maintainer

dcramer Jan 5, 2023
Maintainer

dcramer Jan 5, 2023
Maintainer

dcramer Jan 5, 2023
Maintainer

dcramer Jan 5, 2023
Maintainer

dcramer Jan 6, 2023
Maintainer

evanpurkhiser Feb 10, 2023
Collaborator

dcramer Jan 5, 2023
Maintainer

dcramer Jan 5, 2023
Maintainer

dcramer Jan 5, 2023
Maintainer

gaprl Jan 17, 2023
Maintainer

gaprl Oct 26, 2023
Maintainer

gaprl Dec 2, 2023
Maintainer