Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How many scheduled task were used for the benchmark? #209

Closed
gianielsevier opened this issue Jun 2, 2021 · 23 comments
Closed

How many scheduled task were used for the benchmark? #209

gianielsevier opened this issue Jun 2, 2021 · 23 comments
Labels

Comments

@gianielsevier
Copy link

Hi there,

I'm looking for an alternative for quartz and I think your solution can be the one.
Today we use quartz a lot and can have over 14 million triggers in our DB. Quartz is not behaving well under this number and adding more instances to the cluster don't bring any benefit, the triggers are delaying a lot.

I would like to know what would be the limit of the db-scheduler and if we can add more instances to scale the growing number of scheduled tasks?

@kagkarlsson
Copy link
Owner

Hi!

Could you describe a bit more what type of tasks you have? 14 million recurring tasks? How often are they running?

For the benchmark I created synthetic executions scheduled to run now(), maybe 2 million each time. But I don't think the amount of executions in the table should affect the performance that much, as long as it is indexed properly.
What kind of throughput do you require (executions/s) and what database are you using?

@kagkarlsson
Copy link
Owner

Scaling depends a bit on the tasks as well. Up to the point where the database becomes the bottleneck you can increase throughput by adding instances. If the task does nothing database-related, tests indicate you should be able to reach 10k executions/s.

@gianielsevier
Copy link
Author

Hi, @kagkarlsson sorry for my delayed reply.

Let me explain our use case.

We have different clients that can come to our application and create/update/delete a trigger to run any time.
Our clients are different websites with millions of users interested in receiving recurrent information and for that, they use our system to save it. All of our triggers are dynamically created and we can have thousands running at the same time every second.

The number of triggers is just growing and growing.

Please let I know if you have any other question.

@kagkarlsson
Copy link
Owner

The limiting factor will be the number of triggers running to completion / second. If these triggers/tasks take say 10s to run, and there are 1000 running in parallell, that will approximately be 100 completions/second (also referred to as executions/s).

If you have long-running tasks like that, you will likely first be limited by the size of the thread-pool. That can be increased both per instance (configurable) and by adding more instances.

If you reach a point where you need to run more than say 10.000 completions/s, you might need to use multiple databases and split the triggers/executions between them (i.e. sharding).

How long does a typical trigger / execution / task run?

create/update/delete a trigger to run any time

Is this one-time tasks or recurring on a schedule? If recurring, what is typically the schedule?

Our clients are different websites with millions of users interested in receiving recurrent information and for that, they use our system to save it

Is it one trigger created per user?

@gianielsevier
Copy link
Author

How long does a typical trigger / execution / task run?
It should take a maximum of 1 second

Is this one-time tasks or recurring on a schedule? If recurring, what is typically the schedule?
They are always recurring tasks

Is it one trigger created per user?
It can be one or more per user

@kagkarlsson
Copy link
Owner

Is this one-time tasks or recurring on a schedule? If recurring, what is typically the schedule?
They are always recurring tasks

What is the schedule? Are they evenly spread in time, or are there peaks?

I still feel that I don't have the complete picture here. Currently, at what threshold of executions/s are you starting to experience problems? And how far are you hoping to push that using db-scheduler? Keep in mind that the key-metric here is executions/s.

@gianielsevier
Copy link
Author

gianielsevier commented Jun 23, 2021

Hi @kagkarlsson I've started the POC and I have a question.
I'm trying to use the spring version with tasks created dynamically based on requests coming from a controller.
The tasks are being persisted to the database but the the column task_data is always null.
I'm also confused on how to handle the trigger when it's time to run it.

I've tried to follow the examples from here:
https://github.com/kagkarlsson/db-scheduler/blob/master/examples/features/src/main/java/com/github/kagkarlsson/examples/PersistentDynamicScheduleMain.java

This is the code I'm using to create the task:

Note.: scheduler is

@Service
public class SchedulerService {

    private final ExecutionRunner executionRunner;

    private final CronTriggerBuilder cronTriggerBuilder;

    private final Scheduler scheduler;

    public SchedulerService(
                            final ExecutionRunner executionRunner,
                            final CronTriggerBuilder cronTriggerBuilder,
                            final Scheduler scheduler) {
        this.dataSource = dataSource;
        this.executionRunner = executionRunner;
        this.cronTriggerBuilder = cronTriggerBuilder;
        this.scheduler = scheduler;
    }

    public void create(final DummyPojo pojo) {

        String idOne = pojo.getIdOne();
        String idTwo = pojo.getIdTwo();

        SerializableSchedule serializableSchedule = new SerializableSchedule(idOne, idTwo, cronTriggerBuilder.build(pojo));

        RecurringTask<SerializableSchedule> task = Tasks.recurring(UUID.randomUUID().toString(), serializableSchedule, SerializableSchedule.class)
                .execute(executionRunner);

        Instant newNextExecutionTime = serializableSchedule.getNextExecutionTime(ExecutionComplete.simulatedSuccess(Instant.now()));
        
        TaskInstance<SerializableSchedule> instance = task.instance(idOne);

        scheduler.schedule(instance, newNextExecutionTime);

    }

}

This is the execution runner class:

@Component
public class ExecutionRunner implements VoidExecutionHandler<SerializableSchedule> {

    private final SQSService sqsService;

    public ExecutionRunner(final RotsSqsWorker rotsSqsWorker) {
        this.rotsSqsWorker = rotsSqsWorker;
    }

    @Override
    public void execute(final TaskInstance<SerializableSchedule> taskInstance, final ExecutionContext executionContext) {

        SerializableSchedule serializableSchedule = taskInstance.getData();

        if (serializableSchedule != null) {

            long scheduledTimeEpochSeconds = executionContext.getExecution().executionTime.toEpochMilli();

            SQSMessage message = new SQSMessage();
            message.setIdOne(serializableSchedule.getIdOne());
            message.setIdTwo(serializableSchedule.getIdTwo());
            message.setRandomId(UUID.randomUUID().toString());
            message.setScheduledTimeEpochSeconds(scheduledTimeEpochSeconds);

            sqsService.send(message);
        }

    }
}

This is the SerializableSchedule class:

public class SerializableSchedule implements Serializable, Schedule {

    private final String idOne;

    private final String idTwo;

    private final String cronPattern;

    public SerializableSchedule(final String idOne, final String idTwo, final String cronPattern) {
        this.idOne = idOne;
        this.idTwo = idTwo;
        this.cronPattern = cronPattern;
    }

    @Override
    public Instant getNextExecutionTime(ExecutionComplete executionComplete) {
        return new CronSchedule(cronPattern).getNextExecutionTime(executionComplete);
    }

    @Override
    public boolean isDeterministic() {
        return true;
    }

    public String getIdOne() {
        return idOne;
    }

    public String getIdTwo() {
        return idTwo;
    }

    public String getCronPattern() {
        return cronPattern;
    }

    @Override
    public String toString() {
        return "SerializableCronSchedule pattern=" + cronPattern;
    }
}

@kagkarlsson
Copy link
Owner

kagkarlsson commented Jun 24, 2021

RecurringTask<SerializableSchedule> task = Tasks.recurring(UUID.randomUUID().toString(), serializableSchedule, SerializableSchedule.class)
                .execute(executionRunner);

You only do this once, at scheduler construction and startup. Inject a reference to the task and inject that in SchedulerService and create instances from that. You probably also want to use a CustomTask and disable the scheduleOnStartup(...). The RecurringTask will get automatically added when the scheduler starts if it does not exist

@kagkarlsson
Copy link
Owner

I have gotten a couple of other questions along these lines which has made it clear I need a better Spring Boot example for tasks with dynamic schedule that are added at runtime

@kagkarlsson
Copy link
Owner

Also, for more robust serialization, you may want to consider setting a custom JsonSerializer. (also something I need to add an example for)

@kagkarlsson
Copy link
Owner

This is just setting up the implementation, I see that execute(..) is not the best choice of method-name, should maybe call it onExecute(...)

        final CustomTask<SerializableCronSchedule> task = Tasks.custom("dynamic-recurring-task", SerializableCronSchedule.class)
            .scheduleOnStartup(RecurringTask.INSTANCE, initialSchedule, initialSchedule)
            .onFailure((executionComplete, executionOperations) -> {
                final SerializableCronSchedule persistedSchedule = (SerializableCronSchedule) (executionComplete.getExecution().taskInstance.getData());
                executionOperations.reschedule(executionComplete, persistedSchedule.getNextExecutionTime(executionComplete));
            })
            .execute((taskInstance, executionContext) -> {
                final SerializableCronSchedule persistentSchedule = taskInstance.getData();
                System.out.println("Ran using persistent schedule: " + persistentSchedule.getCronPattern());

                return (executionComplete, executionOperations) -> {
                    executionOperations.reschedule(
                        executionComplete,
                        persistentSchedule.getNextExecutionTime(executionComplete)
                    );
                };
            });
            ```

@gianielsevier
Copy link
Author

Hey, @kagkarlsson many thanks for your help. 🙌
Now it is working as expected. We will prepare the tests and I'll give you an update.

@kagkarlsson
Copy link
Owner

Np. Will be interesting to hear the results. Sounded like a very-high-throughput use-case

@gianielsevier
Copy link
Author

Hi @kagkarlsson,

Finally, I've managed to have time and come back with results:

The POC numbers:
We've created 14 million custom recurrent tasks.
The tasks were created to run like this 2 million per day of week distribute among the 24 hours of the day.
We were running the application on K8S and for that, we dedicated 4 pods with 500MB of memory and 0.5 core CPU.
The database was Postgres DB was an AWS db.m6g.large which has 8GB of memory and 2 vCPU, this instance also handles other applications mainly with Quartz (this is our nonprod environment).

Application behaviour:
Saving the tasks:
We had an endpoint where the client can send a payload asking to save a scheduler (task) giving a day of the week and what time it should run (they are always recurrent)

Running the tasks:
Once it is time to run the tasks the APP was being triggered by DB Scheduler lib collecting the information about the task and sending a message to AWS SQS.

The aim of this POC was to check if db-scheduler would be able to handle millions of schedulers(tasks) without delaying the execution of them (the main issue we have with Quartz today).
We also wanted to make sure that db-schduler would be able to scale horizontally without looking at the db and causing delays
To check the delay we were basically getting the current time - the task execution time and logging it. From our logs, we are also printing which pod did the job.

After making few changes on the configs below:
db-scheduler.threads
db-scheduler.polling-strategy-lower-limit-fraction-of-threads
db-scheduler.polling-strategy-upper-limit-fraction-of-threads

Also checking the number of pods to handle the 14 million tasks saved in our DB we've managed to not have delays.

We kept the POC running for a month and checking our logs it was clear that db-scheduler was able to run with multiple pods distributing equally the load among them and also no delays.

We will start a new project soon to provide a scalable scheduler solution for our company and db-scheduler is the way to go.

Many thanks for your support @kagkarlsson and also for building this incredible solution.

@kagkarlsson
Copy link
Owner

Good to hear! And just to let you know, working on an improvement to your use-case, many instances of the same recurring-tasks with variable schedule: #257

@gianielsevier
Copy link
Author

@kagkarlsson that's great, thanks for the feedback.
I was wondering if I could contribute to your repo by providing an example similar to the POC we did?

@kagkarlsson
Copy link
Owner

Improved api released in 11.0.

I was wondering if I could contribute to your repo by providing an example similar to the POC we did?

I missed your comment here, sorry. If you have such code that you think might be valuable for people to see, how about pushing it to your own github-repo, and I can link from the README ? I can also add a link to this issue where you are describing your setup.

Also, if you are happy users, you are welcome to add your company to the list here:
https://github.com/kagkarlsson/db-scheduler#who-uses-db-scheduler
:)

@huynhnt
Copy link

huynhnt commented Apr 11, 2023

I followed this guide and also to create the the schedule by this way. but i can't cancel this task in my spring boot project.

who can help me?

@PostMapping(path = "stop", headers = {"Content-type=application/json"})
public void stop(@RequestBody StartRequest request) {

    final TaskInstanceId scheduledExecution = TaskInstanceId.of("dynamic-recurring-task", RecurringTask.INSTANCE);
    if(!Objects.isNull(scheduledExecution)) {
        System.out.println("TaskID:" + scheduledExecution.getId());
        schedulerClient.cancel(scheduledExecution);
    }
}

@nj2208
Copy link

nj2208 commented Nov 18, 2023

Hi @kagkarlsson,

Finally, I've managed to have time and come back with results:

The POC numbers: We've created 14 million custom recurrent tasks. The tasks were created to run like this 2 million per day of week distribute among the 24 hours of the day. We were running the application on K8S and for that, we dedicated 4 pods with 500MB of memory and 0.5 core CPU. The database was Postgres DB was an AWS db.m6g.large which has 8GB of memory and 2 vCPU, this instance also handles other applications mainly with Quartz (this is our nonprod environment).

Application behaviour: Saving the tasks: We had an endpoint where the client can send a payload asking to save a scheduler (task) giving a day of the week and what time it should run (they are always recurrent)

Running the tasks: Once it is time to run the tasks the APP was being triggered by DB Scheduler lib collecting the information about the task and sending a message to AWS SQS.

The aim of this POC was to check if db-scheduler would be able to handle millions of schedulers(tasks) without delaying the execution of them (the main issue we have with Quartz today). We also wanted to make sure that db-schduler would be able to scale horizontally without looking at the db and causing delays To check the delay we were basically getting the current time - the task execution time and logging it. From our logs, we are also printing which pod did the job.

After making few changes on the configs below: db-scheduler.threads db-scheduler.polling-strategy-lower-limit-fraction-of-threads db-scheduler.polling-strategy-upper-limit-fraction-of-threads

Also checking the number of pods to handle the 14 million tasks saved in our DB we've managed to not have delays.

We kept the POC running for a month and checking our logs it was clear that db-scheduler was able to run with multiple pods distributing equally the load among them and also no delays.

We will start a new project soon to provide a scalable scheduler solution for our company and db-scheduler is the way to go.

Many thanks for your support @kagkarlsson and also for building this incredible solution.

@gianielsevier

Thanks for providing detailed explanation about you poc . We also have a similar use case . Is it possible for you to share the example code which you have used in your POC ?

Thanks in advance !

@nj2208
Copy link

nj2208 commented Nov 21, 2023

@kagkarlsson Could you please share which example we can follow for similar use case to achieve very high throughput in case of short running jobs which just post message on message broker ?

@kagkarlsson
Copy link
Owner

I think you will get the best throughput using PostgreSQL and .pollUsingLockAndFetch(1.0, 4.0) (thresholds tunable). Possibly increase the number of threads using .threads(xx) (or their spring-boot starter counterparts)

@nj2208
Copy link

nj2208 commented Nov 23, 2023

I think you will get the best throughput using PostgreSQL and .pollUsingLockAndFetch(1.0, 4.0) (thresholds tunable). Possibly increase the number of threads using .threads(xx) (or their spring-boot starter counterparts)

Thanks a lot . Will use these settings in our PoC .

@kagkarlsson
Copy link
Owner

Also make sure you have the necessary indices

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants