Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Background schedule pool fixed #1722

Merged

Conversation

silviucpp
Copy link
Contributor

Hello,

I fixed all observations @ztlpn did on #1689 .

Also the bug on the replication he found was a regression caused by my previous fix : #1649

So the changes between the initial version and this one are:

  • Fix for : Notification of the wakeup_event condition variable is not properly synchronized with the predicate checks. This can lead to deadlocks.
  • Fix for: The deactivated flag in the execute() method is not checked under the mutex.
  • Fix for: Unclear semantics of the schedule() method – probably it should just schedule the task for background execution, why does the caller have to wait if the task is currently executing?

Basically I added a new mutex that synchronize the execute function (exec_mutex) to make sure that only one execute can take place at a time and also used when we deactivate the task to make sure that we wait for any in progress execute.

The other mutex schedule_mutex )it's used only to synchronize the deactivated and scheduled properties in order to make sure that:

  • we have only one schedule notification for same task in the queue
  • when a schedule is called all delayed scheduled tasks are canceled

Also because of this changes there is no longer any need for recursive mutex.

I tested that the bug @ztlpn reported is fixed and also the deadlock found by me. I'll run more tests on the replication process in the following days and update the status here. In the meantime please review and come up with some feedback if any.

Silviu

@silviucpp
Copy link
Contributor Author

are you guys still interested into this pull request ? Just asking because now conflicts with master and as time you don't care about it I will close it

@alexey-milovidov
Copy link
Member

I'm interested in this modification.

For example, if you run quite loaded server under ASan, it will complain about "thread limit exceeded" after few days of run. And your modification will fix it.

But it's rather hard to implement it without breaking something. And @ztlpn had a few complains about this modification. Let @ztlpn to remind these complains and we will continue.

@silviucpp
Copy link
Contributor Author

ok great ! I'll wait for your feedback guys !

@ztlpn
Copy link
Contributor

ztlpn commented Mar 16, 2018

For example, if you run quite loaded server under ASan, it will complain about "thread limit exceeded" after few days of run. And your modification will fix it.

In fact this patch will not help because the root cause is that several threads are created for executing each query, not the threads doing replication (there can be a lot of them but at least the number is stationary).

@@ -193,6 +193,9 @@ StorageReplicatedMergeTree::StorageReplicatedMergeTree(
shutdown_event(false), part_check_thread(*this),
log(&Logger::get(database_name + "." + table_name + " (StorageReplicatedMergeTree)"))
{
initMergeSelectSession();
merge_selecting_handle = context_.getSchedulePool().addTask("StorageReplicatedMergeTree", [this] { mergeSelectingThread(); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task name is misleading - better name it something like StorageReplicatedMergeTree::mergeSelectingThread

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are talking about StorageReplicatedMergeTree name ?. I used the same name as in logs and also as the class name. I will change into "mergeSelectingThread" or better "mergeSelectingTask". Please confirm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are talking about StorageReplicatedMergeTree name ?

Yes. In contrast to say ReplicatedMergeTreePartCheckThread, StorageReplicatedMergeTree is a huge class and it is not evident from the current name what part of the functionality this task is responsible for.

I like StorageReplicatedMergeTree::mergeSelectingTask better than just mergeSelectingTask because it more directly points at the place in code where this task resides. Also may be a good idea to add the table name just as is done for the logger name.

Anyway, I've noticed that the names are only used to log slow executions so being verbose probably won't hurt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK . this point is clear. I'll change this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 6629b03

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

std::function<std::pair<String, String>(const MergeTreeData::DataPartPtr &, const MergeTreeData::DataPartPtr &)> merge_sel_merging_predicate_args_to_key;
std::chrono::steady_clock::time_point merge_sel_now;
std::unique_ptr<CachedMergingPredicate<std::pair<std::string, std::string>> > merge_sel_cached_merging_predicate;
std::function<bool(const MergeTreeData::DataPartPtr &, const MergeTreeData::DataPartPtr &)> merge_sel_can_merge;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that we need a lot of fields here with the common prefix suggests that we should encapsulate the context of the merge selecting task in its own class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes true. I'll do this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ztlpn looking to the code seems not so trivial. merge_sel_uncached_merging_predicate and merge_sel_can_merge are functions that have in their implementations lot of members from StorageReplicatedMergeTree.

How you prefer to deal with this ? I want to avoid to pass to the new context class a reference to StorageReplicatedMergeTree and to make them friend calsses (as I saw into lot of places in the code) because I have to violate the encapsulation. And code won't be better than now..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even a separate class that is a friend of StorageReplicatedMergeTree is better in terms of encapsulation. Yes, it can access private members of StorageReplicatedMergeTree, but StorageReplicatedMergeTree can't access its private members! As opposed to the current solution where everybody can access anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I add the new class StorageReplicatedMergeTreeContext into same header StorageReplicatedMergeTree.h or do you want a separate file ?.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok not problem. Even if I think now calling all this _*Thread is no longer accurate :) better will be _*Task or something. There is no longer any thread there.. Of course I can't change all of them now (existing one) because I will shoot myself in both legs.. merges with other branches are a headache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well in some sense these tasks are still sequential threads of execution, just not represented as OS threads... And I agree that renaming it now would be painful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I will make some time this days to fix this (from what I see last remaining task). I will call the new class ReplicatedMergeTreeMergeSelectingThread and place it in Storages/MergeTree dir.

I also need to bring the branch up to date again with master. I see an ugly conflict there.. need to see how I have to fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,

In commit f811da7 you can find the version of the fix where all merge_sel_* are moved into ReplicatedMergeTreeMergeSelectingThread

What I didn't done yet is to move this class into it's own file. Here I have some problems that I can fix in many ways
and not sure how you prefer.

Problem is that: canMergePartsAccordingToZooKeeperInfo it's used also into line 2456 and this function is locally only in this file.

What I can do is to move canMergePartsAccordingToZooKeeperInfo, partsWillNotBeMergedOrDisabled, CachedMergingPredicate into the new file and
1) for first function I can make it member of ReplicatedMergeTreeMergeSelectingThread so can be used from StorageReplicatedMergeTree as well
2) create in ReplicatedMergeTreeMergeSelectingThread another version for can_merge similar with the one from StorageReplicatedMergeTree line 2456

Please review this commit and also let me know how to proceed. One of the above options or another idea

Silviu

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ztlpn any input on this ? want to finish it and then to merge with master as time lot of conflicts are already accumulating

if (check_period_ms > static_cast<Int64>(storage.data.settings.check_delay_period) * 1000)
check_period_ms = storage.data.settings.check_delay_period * 1000;

storage.queue_updating_task_handle = storage.context.getSchedulePool().addTask("queue_updating_task_handle", [this]{ storage.queueUpdatingThread(); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task is inconsistently named (after the variable name, not the function name).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it to : queueUpdatingThread . Please confirm

Copy link
Contributor

@ztlpn ztlpn Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StorageReplicatedMergeTree::queueUpdatingTask (possibly with the table name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK this is clear!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 6629b03

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if (min_time > current_time)
{
wakeup_event.wait_for(lock, std::chrono::microseconds(min_time - current_time));
continue;
Copy link
Contributor

@ztlpn ztlpn Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is a bit strange - after the thread is awakened, it will be granted delayed_tasks_lock, then it will release it, only to try and immediately reacquire it on line 230.

But probably this doesn't matter for correctness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that in this case the lock is released and then reacquired. But in this case we cannot just schedule the task. Maybe was removed from the queue so for this reason we are going again through checks.

I need to think if calling active() method on the task is enough or there are also other implications.

Copy link
Contributor

@ztlpn ztlpn Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we need to go through all the checks after returning from the wait() method (even if for the possibility of a spurious wakeup).

My point was that it is not necessary to release the lock and immediately re-acquire it to check the condition - the lock is already acquired after we return from the wait() method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ztlpn I'm afraid is not clear for me on how you want this fixed : I mean how I can go through all the checks while also I'm not releasing the lock which is already acquired.

Can you provide some hints ?

Copy link
Contributor

@ztlpn ztlpn Mar 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible solution:
Here is the current code (in pseudocode)

while (!shutdown)
{
    {
        acquire lock;
        
        if (shutdown)
            break;

        if (no tasks)
        {
            condition.wait();
            continue;
        }
    }

    schedule task;
}

Every time we do continue; we release the lock only to immediately re-acquire it.

Now what if we add a second loop:

while (!shutdown)
{
    {
        acquire lock;

        while (!shutdown)
        {
            if (no tasks)
            {
                condition.wait();
                continue;
            }
            else
                break;
        }
    }

    schedule task;
}

This way we still hold the lock after continuing and we execute the inner loop until we get a task to schedule. The lock is released only to wait on the condition.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW there is a bug in the exit predicate (line 265) - should be if (shutdown) not if(!shutdown)...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 7d6268c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You see any issue with this implementation :

void BackgroundSchedulePool::delayExecutionThreadFunction()
{
    setThreadName("BckSchPoolDelay");

    while (!shutdown)
    {
        TaskHandle task;
        bool found = false;

        {
            std::unique_lock lock(delayed_tasks_lock);

            while(!shutdown)
            {
                Poco::Timestamp min_time;

                if (!delayed_tasks.empty())
                {
                    auto t = delayed_tasks.begin();
                    min_time = t->first;
                    task = t->second;
                }

                if (!task)
                {
                    wakeup_cond.wait(lock);
                    continue;
                }

                Poco::Timestamp current_time;

                if (min_time > current_time)
                {
                    wakeup_cond.wait_for(lock, std::chrono::microseconds(min_time - current_time));
                    continue;
                }
                else
                {
                    //we have a task ready for execution
                    found = true;
                    break;
                }
            }
        }

        if(found)
            task->schedule();
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0a05769


if (!task)
{
wakeup_event.wait(lock);
Copy link
Contributor

@ztlpn ztlpn Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a small possibility of deadlock on shutdown here. If this thread checks the shutdown variable and then sleeps for some reason before waiting on the condition, it can miss the notification from destructor (line 142) and wait forever.

The wakeup_event field is misnamed - it is a std::condition_variable, not Poco::Event (which has different semantics and will save notifications even when no one is waiting for them).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can fix the deadlock by adding std::lock_guard lock(delayed_tasks_lock); in desturctor before shutdown = true. What do you think ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would still leave a tiny window for deadlock - because shutdown flag is not checked under the lock in the delayExecutionThreadFunction loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right we can add a check inside the lock as well on the first line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 6629b03

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}


void BackgroundSchedulePool::scheduleDelayedTask(const TaskHandle & task, size_t ms, std::lock_guard<std::mutex>&)
Copy link
Contributor

@ztlpn ztlpn Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small deviation from the style guide (the space before the & ;))

Also, I think the code is more readable when the lock argument is explicitly named to document exactly which mutex needs to be locked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change did by @alexey-milovidov in his first review. I can add a variable name there but will be unused inside the method. I think this is the reason why he picked to not put a name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, the variable will be unused. Its name can be put in a comment though:
, std::lock_guard<std::mutex> & /* schedule_mutex_lock */)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK this point is clear !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 6629b03

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -444,6 +462,11 @@ bool ZooKeeper::exists(const std::string & path, Stat * stat_, const EventPtr &
return existsWatch(path, stat_, callbackForEvent(watch));
}

bool ZooKeeper::exists(const std::string & path, Stat * stat, const TaskHandlePtr & watch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this class should know about TaskHandles. Can we use the generic callback interface (with TaskHandles converted to callbacks outside the ZooKeeper class)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit tricky. If we want a uniform implementation we need 2 implementations for that interface. One using the EventPtr and one using TaskHandlePtr. Because both functions are doing same stuff.. only signaling it's done via a different method.

What do you think ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have 2 impls for each method: one with EventPtr and one with WatchCallback (generic callback interface). Clients of BackgroundSchedulePool that want their tasks to be scheduled when a watch fires, should create WatchCallbacks themselves. This way ZooKeeper and BackgroundSchedulePool can be decoupled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK noted !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit a2dc16a

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

CurrentMetrics::Increment metric_increment{CurrentMetrics::BackgroundSchedulePoolTask};

Stopwatch watch;
function();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As scheduled flag is false here, we can schedule this task once more from another thread. If a thread in the pool is available, it will immediately try to execute but will be stuck waiting for the exec_mutex. Which is good because we don't want to execute the function more than once at any point in time.

But it is bad from the performance point of view because the executing task will occupy not one but two threads in the pool which are in short supply. We should try to minimize waiting in BackgroundSchedulePool::threadFunction().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea on how we can do this safely ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the pool has no idea if the task is currently executing.

If we think of a task more as a state machine... Here are the current states: Idle, Scheduled, Executing, Delayed. We want to go from Executing to Scheduled and we need to do it without waiting (that was the problem with the first version). But if we transition the task to the Scheduled state it will be immediately picked up for execution by another thread that will then get stuck on the exec_mutex (the current problem).

So possible path to the solution is to add one more state ExecutingScheduled (maybe just another flag) and to check after executing the task if it was scheduled once more in the meantime. If it was, execute it again right away (possibly in the same thread!).

The same goes for the situation when we delay the currently executing task for some small amount and this delay expires before the task finishes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed in 24de8d6

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ztlpn
Copy link
Contributor

ztlpn commented Mar 16, 2018

Please rebase the branch or merge the latest stable into it so that the changes are easily testable.

@silviucpp
Copy link
Contributor Author

Hello @ztlpn ,

I'll try to rebase with last master but there are conflicts in several places which are not straight forward to fix. I first need to review what changes were done in the meantime and then work on this to make sure I won't break something.

Last time when I tried I failed :) maybe I was to tired. I'll try again next week. Unfortunately this is the problem when pull requests stays without review for long time.

@silviucpp
Copy link
Contributor Author

@ztlpn How many threads are created per each insert and select query ? It's a fixed number or depends from query to query ?

@ztlpn
Copy link
Contributor

ztlpn commented Mar 16, 2018

How many threads are created per each insert and select query ? It's a fixed number or depends from query to query ?

Depends on the query and settings. Typically, to execute a SELECT query ClickHouse will create around max_threads new threads that will then immediately exit. But if it is a Distributed query, a thread for each shard will be created and that can be hundreds.

This is not considered a big problem because thread creation on Linux is very efficient. But they specifically are the reason ASan hits its thread limits (and not the replication threads). Possible fix - use for query execution threads from some other thread pool with elastic resizing and a big upper thread limit. That's a separate issue though :)

Copy link
Contributor

@ztlpn ztlpn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@silviucpp silviucpp force-pushed the background-schedule-pool-fix branch from c90c8db to a2dc16a Compare March 22, 2018 15:39
@silviucpp
Copy link
Contributor Author

@ztlpn Now the branch has no longer conflicts with master.

…threads in the pool which are in short supply.
@silviucpp
Copy link
Contributor Author

@alexey-milovidov @ztlpn : Excepting 2 points where I added some comments because is not clear enough for me how you except to implement them all should be fixed and conflict free with master. Once you clarify the other 2 I'll fix them and run a load test and some functionality tests again.

@@ -163,7 +164,7 @@ class ReplicatedMergeTreeQueue
* If next_update_event != nullptr, will call this event when new entries appear in the log.
* Returns true if new entries have been.
*/
bool pullLogsToQueue(zkutil::ZooKeeperPtr zookeeper, zkutil::EventPtr next_update_event);
bool pullLogsToQueue(zkutil::ZooKeeperPtr zookeeper, BackgroundSchedulePool::TaskHandle next_update_event);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be next_update_task.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or _task_handle. Both are ok as long as you name all of them consistently (currently _handle is also used in a couple of places).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 31874ed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about merge_selecting_handle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5099284

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, you misunderstood me :) next_update_task_handle is perfectly OK, I was talking about the StorageReplicatedMergeTree member variable which is inconsistently named.

I won't comment several times if if there is a common issue with several separate places in code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted prev. commit. You mean to replace merge_selecting_handle with merge_selecting_task_handle ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1418e33

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -161,9 +160,11 @@ class ZooKeeper
int32_t tryRemoveEphemeralNodeWithRetries(const std::string & path, int32_t version = -1, size_t * attempt = nullptr);

bool exists(const std::string & path, Stat * stat = nullptr, const EventPtr & watch = nullptr);
bool exists(const std::string & path, Stat * stat, const WatchCallback & watch_callback);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for this method as it is the same as existsWatch(). The reason for the -Watch suffix is that overload resolution becomes somewhat difficult because there is the other method with a lot of default parameters.

Anyway, the interface for get(), tryGet() and exists() should be consistent. So you probably should add and use the getWatch() method, delete this exists() overload and use existsWatch() method.

Copy link
Contributor Author

@silviucpp silviucpp Mar 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4361df9 . Here we need to take into account that libzookeper will be removed soon, and the new implementation that @alexey-milovidov did have a slightly different interface. We will see what conflicts we will have on merge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for consistency getWatch() instead of get().

Yes, there will be some conflicts. Who knows which pull will get merged first :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 438121e

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Silviu Caragea added 9 commits March 26, 2018 22:37
…task_handle to merge_selecting_handle) (reverted from commit 5099284)
# Conflicts:
#	dbms/src/Common/ZooKeeper/LeaderElection.h
#	dbms/src/Common/ZooKeeper/ZooKeeper.cpp
#	dbms/src/Storages/MergeTree/ReplicatedMergeTreeAlterThread.cpp
#	dbms/src/Storages/MergeTree/ReplicatedMergeTreeCleanupThread.cpp
#	dbms/src/Storages/MergeTree/ReplicatedMergeTreeCleanupThread.h
#	dbms/src/Storages/MergeTree/ReplicatedMergeTreePartCheckThread.cpp
#	dbms/src/Storages/MergeTree/ReplicatedMergeTreeRestartingThread.cpp
#	dbms/src/Storages/StorageReplicatedMergeTree.cpp
@silviucpp
Copy link
Contributor Author

Branch merged with master

@alexey-milovidov
Copy link
Member

This is almost ready for merge. Alexey @ztlpn is doing final reviews.

Silviu Caragea added 2 commits April 19, 2018 09:26
@silviucpp
Copy link
Contributor Author

Great ! I merged again with master to fix all conflicts .

Silviu Caragea added 2 commits April 24, 2018 17:52
…-pool-fix

# Conflicts:
#	dbms/src/Common/ZooKeeper/LeaderElection.h
#	dbms/src/Storages/MergeTree/ReplicatedMergeTreeAlterThread.cpp
#	dbms/src/Storages/MergeTree/ReplicatedMergeTreeCleanupThread.cpp
#	dbms/src/Storages/MergeTree/ReplicatedMergeTreePartCheckThread.cpp
#	dbms/src/Storages/StorageReplicatedMergeTree.cpp
@silviucpp
Copy link
Contributor Author

@ztlpn @alexey-milovidov From what I see ReplicatedMergeTreeCleanupThread it's called to often even if the table is idle, no read or write happened. This task is not triggered by any action that can create data that needs to be cleaned. it simply reschedule itself on a regular interval.

This takes lot of cpu and I think its better to change it to be triggered only when actions that can generate obsolete data in a table takes place.

@alexey-milovidov
Copy link
Member

This takes lot of cpu and I think its better to change it to be triggered only when actions that can generate obsolete data in a table takes place.

That's correct, it will be much better.

@silviucpp
Copy link
Contributor Author

Same problem for merge_selecting_task_handle which on IDLE tables because there is nothing to merge its reschedule every 5 seconds (MERGE_SELECTING_SLEEP_MS)

And another one is ReplicatedMergeTreeRestartingThread which checks if Zookeper session expired. Even this one reschedule itself very aggressively. @alexey-milovidov now that you rewrited the zookeper implementation there is no other method to detect when the session expired without polling ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants