-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring to perform cardinality estimation specifically for YT. #4482
Conversation
⚪ |
⚪ |
⚪ |
⚪
|
⚪
|
TOptimizerStatistics() : KeyColumns(EmptyColumns) {} | ||
TOptimizerStatistics(double nrows, int ncols): Nrows(nrows), Ncols(ncols), KeyColumns(EmptyColumns) {} | ||
TOptimizerStatistics(double nrows, int ncols, double cost): Nrows(nrows), Ncols(ncols), Cost(cost), KeyColumns(EmptyColumns) {} | ||
TOptimizerStatistics(EStatisticsType type, double nrows, int ncols, double cost): Type(type), Nrows(nrows), Ncols(ncols), Cost(cost), KeyColumns(EmptyColumns) {} | ||
TOptimizerStatistics(EStatisticsType type, double nrows, int ncols, double byteSize, double cost): Type(type), Nrows(nrows), Ncols(ncols), ByteSize(byteSize), Cost(cost), KeyColumns(EmptyColumns) {} | ||
TOptimizerStatistics(EStatisticsType type, double nrows, int ncols, double cost, const TVector<TString>& keyColumns): Type(type), Nrows(nrows), Ncols(ncols), Cost(cost), KeyColumns(keyColumns) {} | ||
TOptimizerStatistics(EStatisticsType type, double nrows, int ncols, double byteSize, double cost, const TVector<TString>& keyColumns): Type(type), Nrows(nrows), Ncols(ncols), ByteSize(byteSize), Cost(cost), KeyColumns(keyColumns) {} | ||
TOptimizerStatistics(EStatisticsType type, double nrows, int ncols, double byteSize, double cost, const TVector<TString>& keyColumns, IProviderStatistics* specific) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Почему нельзя все сделать одним конструктором с дефолтными значениями аргументов?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Undefined, | ||
LookupJoin, | ||
MapJoin, | ||
GraceJoin, | ||
StreamLookupJoin //Right part can be updated during an operation. Used mainly for joining streams with lookup tables. Currently impplemented in Dq by LookupInputTransform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Здесь перечислены только DQ-специфичные алгоритмы. В YT провайдере есть другие реализации. Что с ними предполагается делать? Как будет расширяться этот enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Предполагается что enum будет общим для DQ и YT. Если YT понадобится алгоритм джойна, который здесь не перечислен, то для DQ IsJoinApplicable() должен вернуть false для этого алгоритма.
Сейчас YT делает либо Map join, либо merge join (GraceJoin в этом енуме).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GraceJoin больше соответствует Common стратегии. Не понятно, как тут различать merge и common
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Добавил MergeJoin
В качестве альтернативы, можно привязать алгоритмы все джойна к системе, в которой они выполняются:
DqLookupJoin,
DqMapJoin,
..
YtMapJoin,
YtMergeJoin
чтобы было понятно из значения enum-а как конкретно он будет исполняться. Но мне кажется, это излишнее усложнение.
TJoinColumn(TString relName, TString attributeName) : RelName(relName), | ||
AttributeName(attributeName) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::move()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -37,7 +23,7 @@ bool NDq::operator < (const NDq::TJoinColumn& c1, const NDq::TJoinColumn& c2) { | |||
return false; | |||
} | |||
|
|||
TString NYql::ConvertToJoinAlgoString(EJoinAlgoType joinAlgo) { | |||
TString ConvertToJoinAlgoString(EJoinAlgoType joinAlgo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Используй GENERATE_ENUM_SERIALIZATION в ya.make
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Эта функция используется для вывода в std::stringstream (как и ConvertToJoinString) который не поддерживается GENERATE_ENUM_SERIALIZATION
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Не понял про поддержку. Ты будешь просто писать ToString(joinAlgo) вместо ConvertToJoinAlgoString(joinAlgo). И ToString() тебе сгенерит GENERATE_ENUM_SERIALIZATION, а не руками перечислять все константы
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, done.
const std::set<std::pair<NDq::TJoinColumn, NDq::TJoinColumn>>& joinConditions, | ||
EJoinAlgoType joinAlgo) const override; | ||
|
||
static const TBaseProviderContext& instance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: Instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -109,4 +109,123 @@ void TJoinOptimizerNode::Print(std::stringstream& stream, int ntabs) { | |||
RightArg->Print(stream, ntabs+1); | |||
} | |||
|
|||
bool IsPKJoin(const TOptimizerStatistics& stats, const TVector<TString>& joinKeys) { | |||
if (stats.KeyColumns.size()==0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: пробелы вокруг ==
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return false; | ||
} | ||
|
||
for(size_t i=0; i<stats.KeyColumns.size(); i++){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: пробелы вокруг =, < и ()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
⚪
|
⚪
|
e277117
to
14c68ba
Compare
⚪
|
⚪
|
⚪
|
⚪
|
Reviewed in
#3866
but accidentally closed that PR - reopening.