-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[native] Add TPC-DS connector #23067
base: master
Are you sure you want to change the base?
Conversation
69d9c1b
to
a88f29d
Compare
a88f29d
to
03214b7
Compare
8734eef
to
3f40e32
Compare
Consider updating the Use Cases in the Presto C++ section of the Presto doc to add TPC-DS to the supported connectors. |
4d36367
to
375dc59
Compare
375dc59
to
c9e0947
Compare
64ab39e
to
f08bd87
Compare
Consider a release note entry similar to the following:
|
dc5f0e2
to
8a5fe7c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new local docs build, looks good. Thanks!
32aa4e4
to
42377b8
Compare
1f7db4d
to
1b86b4d
Compare
6d9a0c5
to
e69c19d
Compare
@aditi-pandit @yingsu00 |
FYI: @czentgr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pramodsatya and @pdabre12.
Have done a quick round of review.
presto-native-execution/presto_cpp/main/connectors/tpcds/dsdgen/dsdgen-c/StringBuffer.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/CMakeLists.txt
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/tpcds/TpcdsGen.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/tpcds/TpcdsGen.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/tpcds/TpcdsGen.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work done here!
# Without this hack, there are multiple link errors similar to the one below | ||
# only on GCC. "undefined reference to `vtable for | ||
# velox::connector::tpcds::TpcdsTableHandle`. TODO: Fix this hack. | ||
target_link_libraries(velox_exec_test_lib presto_tpcds_connector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
velox_exec_test_lib
is a collection of velox utilities.
Which objects gets the undefined reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gives an undefined reference to TpcdsTableHandle
.
It's similar to
# "undefined reference to `vtable for velox::connector::tpch::TpchTableHandle`" |
presto-native-execution/presto_cpp/main/connectors/tpcds/DSDGenIterator.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/tpcds/DSDGenIterator.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/tpcds/DSDGenIterator.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/tpcds/TpcdsConnector.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/tpcds/TpcdsConnector.h
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/tpcds/TpcdsGen.cpp
Outdated
Show resolved
Hide resolved
|
||
Table fromTableName(std::string_view tableName); | ||
|
||
RowVectorPtr genTpcdsCallCenter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These all have the same signature but only a different name.
You can use a macro to define the signature and then just generate the signature for each name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the code a little bit, can you take another look now?
presto-tpcds/src/main/java/com/facebook/presto/tpcds/TpcdsMetadata.java
Outdated
Show resolved
Hide resolved
@aditi-pandit @czentgr. |
ad02d92
to
6f39bf1
Compare
@czentgr @aditi-pandit Will address the failing header checks soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the dsdgen should be moved to the external folder.
47e2545
to
cc187f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intiial comments. There might be more but to get you going.
presto-native-execution/presto_cpp/main/connectors/tpcds/CMakeLists.txt
Outdated
Show resolved
Hide resolved
# Add subdirectories | ||
add_subdirectory(dsdgen/dsdgen-c) | ||
|
||
add_library(append_info OBJECT include/append_info-c.cpp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda strange there is a cpp file in a directory called "include" which I presume means cmake doesn't look for CMakeLists.txt in it so we need the CMP0079
policy to define the target here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's why we need the policy here.
Changed the directory name to "utils".
presto-native-execution/presto_cpp/main/connectors/tpcds/DSDGenIterator.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/tpcds/DSDGenIterator.h
Outdated
Show resolved
Hide resolved
cc187f0
to
eb763e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdabre12 Some comments. I think we can merge the external portion first in a separate commit as they are copied externally. That makes it easy to review the new bits. I see that it is already a separate commit.
@aditi-pandit, @tdcmeehan I think we should have a RFC for this work. What do you think? The RFC can be simple.
* TPCH connector, with ``tpch.naming=standard`` catalog property. | ||
|
||
* TPCDS connector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a column naming issue for TPCDS similar to TPCH?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not aware of the column naming issue for TPCH, can you please elaborate?
#pragma once | ||
|
||
#include "velox/common/memory/Memory.h" | ||
#include "velox/vector/ComplexVector.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this header? Can we forward declare?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Should be possible to forward declare RowVectorPtr.
presto-native-execution/presto_cpp/main/connectors/tpcds/TpcdsGen.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/tpcds/utils/append_info-c.hpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/tpcds/utils/append_info-c.hpp
Outdated
Show resolved
Hide resolved
presto-tpcds/src/main/java/com/facebook/presto/tpcds/TpcdsMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdabre12, @pramodsatya : Have a first round of comments.
// TpcdsTransactionHandle is special since | ||
// the corresponding class in Java is an enum. | ||
|
||
namespace facebook::presto::protocol { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add API's to PrestoToVeloxConnector to handle transaction handles as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, could this be taken up in a separate PR? (since it affects transaction handles for all connectors)
presto-native-execution/presto_cpp/main/connectors/tpcds/utils/append_info-c.hpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/tpcds/utils/append_info-c.hpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/tpcds/utils/append_info-c.hpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/tpcds/DSDGenIterator.h
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/tpcds/DSDGenIterator.h
Outdated
Show resolved
Hide resolved
vector_size_t parallel, | ||
vector_size_t child); | ||
|
||
void initializeTable(const std::vector<VectorPtr>& children, int table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comments for each of the methods in this class as this is the final interface seen over dsdgen
presto-native-execution/presto_cpp/main/connectors/tpcds/DSDGenIterator.h
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/tpcds/DSDGenIterator.h
Outdated
Show resolved
Hide resolved
@pdabre12 @majetideepak : Small RFC is okay. Let's bring this to Native worker working group next session so that it gets immediate attention from others. |
eb763e0
to
ce9914b
Compare
ce9914b
to
9dba598
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull branch, local docs build.
5ff51ef
to
a1afddd
Compare
b9c985d
to
fb56a83
Compare
Co-authored-by: Pramod Satya <pramod.satya@ibm.com>
fb56a83
to
09a13d0
Compare
09a13d0
to
a373413
Compare
Co-authored-by: Pramod Satya <pramod.satya@ibm.com>
a373413
to
ce1e567
Compare
Description
Adds the native TPC-DS connector and the codegen (dsdgen) files. The TPC-DS connector will be a wrapper for the data generator (dsdgen) provided by the TPC organization, originally implemented in C. This implementation must mimic the behavior of the original C implementation. DuckDB has a TPC-DS connector that uses C++ files to wrap the C implementation, and we will use these C++ files for our implementation.
Motivation and Context
The goal is to add a TPC-DS connector to enhance our ability to write end-to-end tests for Prestissimo and conduct micro-benchmarks in Velox. This addition will significantly improve our testing capabilities by ensuring comprehensive coverage and performance validation.
Impact
With this change, we can now write e2e tests and microbenchmarks.
Test Plan
Native end-to-end tests.
Future enhancements will include adding SpeedTest and ConnectorTest to the Velox repository.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
Resolves: #22361