Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUES-2259 support truncate syntax #2260

Merged

Conversation

zhang2014
Copy link
Contributor

@zhang2014 zhang2014 commented Apr 22, 2018

#2259

CheckList

  • StorageBuffer
  • StorageCatBoostPool
  • StorageDictionary
  • StorageDistributed
  • StorageFile
  • StorageKafka
  • StorageLog
  • StorageMaterializedView
  • StorageMemory
  • StorageMerge
  • StorageMergeTree
  • StorageMySQL
  • StorageNull
  • StorageODBC
  • StorageReplicatedMergeTree
  • StorageSet
  • StorageStripeLog
  • StorageTinyLog
  • StorageView
    I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

@@ -126,6 +126,13 @@ void StorageMergeTree::drop()
data.dropAllData();
}

void StorageMergeTree::truncate()
{
shutdown();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexey-milovidov shutdown is synchronous? What would I do if it wasn't ? Because I need to know when the merge task is terminated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's synchronous.

@alexey-milovidov
Copy link
Member

Please check for ReplicatedMergeTree tables. The implementation shouldn't be trivial for them.

@zhang2014
Copy link
Contributor Author

The current idea is based on the ReplicatedMergeTreeLogEntryData::TRUNCATE_TABLE command

@zhang2014 zhang2014 force-pushed the feature/support_truncate branch 6 times, most recently from a7b8106 to ad40fcf Compare April 24, 2018 12:41
@zhang2014 zhang2014 force-pushed the feature/support_truncate branch 3 times, most recently from 00588d4 to f7fb7ab Compare May 2, 2018 01:32
@zhang2014 zhang2014 force-pushed the feature/support_truncate branch 5 times, most recently from fbdfa12 to 9db8f35 Compare May 5, 2018 16:06
@zhang2014 zhang2014 changed the title [WIP] ISSUES-2259 support truncate syntax ISSUES-2259 support truncate syntax May 5, 2018
@zhang2014
Copy link
Contributor Author

I'm done. Please help review the changes. @alexey-milovidov @ztlpn

@zhang2014 zhang2014 force-pushed the feature/support_truncate branch 3 times, most recently from 31c0789 to 8936d88 Compare May 5, 2018 18:13
@@ -54,6 +54,9 @@ using SetResponse = ZooKeeperImpl::ZooKeeper::SetResponse;
using ListResponse = ZooKeeperImpl::ZooKeeper::ListResponse;
using CheckResponse = ZooKeeperImpl::ZooKeeper::CheckResponse;

template <typename R>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comment.

@@ -54,6 +54,9 @@ using SetResponse = ZooKeeperImpl::ZooKeeper::SetResponse;
using ListResponse = ZooKeeperImpl::ZooKeeper::ListResponse;
using CheckResponse = ZooKeeperImpl::ZooKeeper::CheckResponse;

template <typename R>
using MultiAsyncResponse = std::vector<std::pair<std::string, std::future<R>>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad name. It shouldn't mess up with "multi" operation in ZooKeeper.
More correct name: AsyncResponses.

alexey-milovidov added a commit that referenced this pull request Jun 9, 2018
alexey-milovidov added a commit that referenced this pull request Jun 9, 2018
alexey-milovidov added a commit that referenced this pull request Jun 9, 2018
@alexey-milovidov alexey-milovidov merged commit 3afb335 into ClickHouse:master Jun 9, 2018
/// Delete metadata, the deletion of which differs from the recursive deletion of the directory, if any.
virtual void drop() = 0;
/// Delete database metadata, if exists.
virtual void drop(Context & context)
Copy link
Member

@alexey-milovidov alexey-milovidov Jun 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method body should be in .cpp

This method is not quite correct, because only Ordinary databases have metadata for tables as files in metadata directory. Other databases doesn't require removal of metadata from filesystem.

Creating and removing database.sql files is not the responsibility of Database itself.
For example, database doesn't create this file, so it shouldn't remove it.

Another example: tables (Storages) are not responsible for creating and removing .sql files - it is the responsibility of Database. By the same principle, Database is not responsible for creating and removing its .sql files - this is the responsibility of caller.

If this method removes internal metadata directory, why it doesn't remove internal data directory?

Fixed.

@@ -0,0 +1,68 @@
#include <Interpreters/ClusterProxy/TruncateStreamFactory.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't proxy TRUNCATE query for Distributed table, because query ON CLUSTER already do it.

String current_table_name = table.first->getTableName();

if (drop.detach)
else if (kind == ASTDropQuery::Kind::Truncate)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkTableCanBeDropped should be called for truncate, because it is equally dangerous operation.

String getID() const override { return (detach ? "DetachQuery_" : "DropQuery_") + database + "_" + table; };
String getID() const override
{
if (kind == ASTDropQuery::Kind::Drop)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-trivial methods should be in .cpp.

<< (detach ? "DETACH TABLE " : "DROP TABLE ")
<< (if_exists ? "IF EXISTS " : "") << (settings.hilite ? hilite_none : "")
<< (!database.empty() ? backQuoteIfNeed(database) + "." : "") << backQuoteIfNeed(table);
settings.ostr << (settings.hilite ? hilite_keyword : "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-paste.

/** Delete the table data. Called before deleting the directory with the data.
* If you do not need any action other than deleting the directory with data, you can leave this method blank.
*/
virtual void truncate(const ASTPtr & /*query*/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a copy-paste of the above. This may lead to confusion.
I've totally rewrite this comment.

@@ -797,7 +797,16 @@ void MergeTreeData::dropAllData()

LOG_TRACE(log, "dropAllData: removing data from filesystem.");

Poco::File(full_path).remove(true);
std::vector<Poco::File> data_dirs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Motivation is unclear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comments.

alexey-milovidov added a commit that referenced this pull request Jun 9, 2018
}
}

ClusterProxy::TruncateStreamFactory truncate_stream_factory(cluster);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this.


void StorageMergeTree::truncate(const ASTPtr & /*query*/)
{
merger.merges_blocker.cancelForever();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How the merges are supposed to work after truncate?


if (fake_part_name.empty())
if (dropBlocksInPartition(*zookeeper, partition_id, entry, detach))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropParts, not Blocks.

@@ -3808,14 +3802,18 @@ void ReplicatedMergeTreeMergeSelectingThread::clearState()
{
deduplicate = false; /// TODO: read deduplicate option from table config

uncached_merging_predicate = [this](const MergeTreeData::DataPartPtr & left, const MergeTreeData::DataPartPtr & right)
uncached_merging_predicate = [this](const MergeTreeData::DataPartPtr &left, const MergeTreeData::DataPartPtr &right)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidential modification.

{
return canMergePartsAccordingToZooKeeperInfo(left, right, storage->getZooKeeper(), storage->zookeeper_path, storage->data);
return canMergePartsAccordingToZooKeeperInfo(left,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidential modification.

};

merging_predicate_args_to_key = [](const MergeTreeData::DataPartPtr & left, const MergeTreeData::DataPartPtr & right)
merging_predicate_args_to_key = [](const MergeTreeData::DataPartPtr &left, const MergeTreeData::DataPartPtr &right)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidential modification.

{
return std::make_pair(left->name, right->name);
return std::make_pair(left->name, right->name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidential modification. Wrong indentation.

@@ -3824,11 +3822,49 @@ void ReplicatedMergeTreeMergeSelectingThread::clearState()

now = std::chrono::steady_clock::time_point();

can_merge = [&] (const MergeTreeData::DataPartPtr & left, const MergeTreeData::DataPartPtr & right, String *)
can_merge = [&](const MergeTreeData::DataPartPtr &left, const MergeTreeData::DataPartPtr &right, String *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidential modification.

return partsWillNotBeMergedOrDisabled(left, right, storage->queue)
&& cached_merging_predicate->get(now, uncached_merging_predicate, merging_predicate_args_to_key, left, right);
return partsWillNotBeMergedOrDisabled(left, right, storage->queue)
&& cached_merging_predicate->get(now, uncached_merging_predicate, merging_predicate_args_to_key, left, right);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidential modification.

alexey-milovidov added a commit that referenced this pull request Jun 9, 2018
@alexey-milovidov
Copy link
Member

alexey-milovidov commented Jun 9, 2018

I would like to prefer different method of implementation:

There is default implementation for truncate, that simply do drop and create table under lock.
All tables will use default implementation except ReplicatedMergeTree.

Cons:

  • the lock will be in global scope, not for table scope;
  • metadata_modification_time will be updated;
  • this method is less atomic: the table may become dropped instead of truncated when server was died in the middle;

Pros:

  • simple implementation, almost no additional logic.

alexey-milovidov added a commit that referenced this pull request Jun 10, 2018
@alexey-milovidov
Copy link
Member

alexey-milovidov commented Jun 10, 2018

@zhang2014

Thank you! This is definitely useful feature.
What to do further... For example, our users have a request to implement
CREATE OR REPLACE for tables and especially for views.

This should be done atomically (from the client point of view). It can be done under global (context) lock.

@zhang2014
Copy link
Contributor Author

zhang2014 commented Jun 11, 2018

@alexey-milovidov Thank for you review, I have learned a lot.
If necessary, I will continue to follow up and fix these problems, But it may be a few days later, The TCP ClickHouse JDBC Driver we implemented will be released recently : )
cc @sundy-li

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants