Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native] Add TPC-DS connector #23067

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

pdabre12
Copy link
Contributor

@pdabre12 pdabre12 commented Jun 24, 2024

Description

Adds the native TPC-DS connector and the codegen (dsdgen) files. The TPC-DS connector will be a wrapper for the data generator (dsdgen) provided by the TPC organization, originally implemented in C. This implementation must mimic the behavior of the original C implementation. DuckDB has a TPC-DS connector that uses C++ files to wrap the C implementation, and we will use these C++ files for our implementation.

Motivation and Context

The goal is to add a TPC-DS connector to enhance our ability to write end-to-end tests for Prestissimo and conduct micro-benchmarks in Velox. This addition will significantly improve our testing capabilities by ensuring comprehensive coverage and performance validation.

Impact

With this change, we can now write e2e tests and microbenchmarks.

Test Plan

Native end-to-end tests.
Future enhancements will include adding SpeedTest and ConnectorTest to the Velox repository.

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

TPC-DS Connector Changes
* Add TPC-DS connector for Presto C++. :pr:`23067`

Resolves: #22361

Copy link

linux-foundation-easycla bot commented Jun 24, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@pdabre12 pdabre12 changed the title [native] Adds the TPC-DS connector [WIP] [native] Adds the TPC-DS connector Jul 9, 2024
@steveburnett
Copy link
Contributor

Consider updating the Use Cases in the Presto C++ section of the Presto doc to add TPC-DS to the supported connectors.

@steveburnett
Copy link
Contributor

Consider a release note entry similar to the following:

== RELEASE NOTES ==

TPC-DS Connector Changes
* Add TPC-DS connector for Presto C++. :pr:`23067`

steveburnett
steveburnett previously approved these changes Aug 7, 2024
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local docs build, looks good. Thanks!

@pramodsatya pramodsatya changed the title [WIP] [native] Adds the TPC-DS connector [native] Adds the TPC-DS connector Aug 7, 2024
@pdabre12 pdabre12 force-pushed the tpcds-connector branch 2 times, most recently from 6d9a0c5 to e69c19d Compare August 9, 2024 20:30
@pdabre12
Copy link
Contributor Author

pdabre12 commented Aug 9, 2024

@aditi-pandit @yingsu00
This PR is now ready for review. Please take a look.
Thanks.

@pdabre12 pdabre12 marked this pull request as ready for review August 9, 2024 20:32
@pdabre12 pdabre12 requested review from elharo and a team as code owners August 9, 2024 20:32
@pdabre12
Copy link
Contributor Author

pdabre12 commented Aug 9, 2024

FYI: @czentgr

@pdabre12 pdabre12 requested a review from czentgr August 9, 2024 20:42
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya and @pdabre12.

Have done a quick round of review.

Copy link
Contributor

@czentgr czentgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work done here!

presto-native-execution/etc/catalog/tpcds.properties Outdated Show resolved Hide resolved
# Without this hack, there are multiple link errors similar to the one below
# only on GCC. "undefined reference to `vtable for
# velox::connector::tpcds::TpcdsTableHandle`. TODO: Fix this hack.
target_link_libraries(velox_exec_test_lib presto_tpcds_connector)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

velox_exec_test_lib is a collection of velox utilities.
Which objects gets the undefined reference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gives an undefined reference to TpcdsTableHandle.
It's similar to

# "undefined reference to `vtable for velox::connector::tpch::TpchTableHandle`"
.


Table fromTableName(std::string_view tableName);

RowVectorPtr genTpcdsCallCenter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These all have the same signature but only a different name.
You can use a macro to define the signature and then just generate the signature for each name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the code a little bit, can you take another look now?

@pdabre12
Copy link
Contributor Author

@aditi-pandit @czentgr.
Thanks for the review.
Stalling this PR until we figure out the licensing.

@pdabre12
Copy link
Contributor Author

pdabre12 commented Aug 29, 2024

@czentgr @aditi-pandit
The licensing issue is resolved.
Addressed your comments , can you have another look?
Thank you.

Will address the failing header checks soon.

@majetideepak majetideepak changed the title [native] Adds the TPC-DS connector [native] Add TPC-DS connector Sep 6, 2024
Copy link
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the dsdgen should be moved to the external folder.

@pdabre12 pdabre12 force-pushed the tpcds-connector branch 2 times, most recently from 47e2545 to cc187f0 Compare September 6, 2024 19:54
Copy link
Contributor

@czentgr czentgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intiial comments. There might be more but to get you going.

# Add subdirectories
add_subdirectory(dsdgen/dsdgen-c)

add_library(append_info OBJECT include/append_info-c.cpp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda strange there is a cpp file in a directory called "include" which I presume means cmake doesn't look for CMakeLists.txt in it so we need the CMP0079 policy to define the target here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's why we need the policy here.
Changed the directory name to "utils".

Copy link
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdabre12 Some comments. I think we can merge the external portion first in a separate commit as they are copied externally. That makes it easy to review the new bits. I see that it is already a separate commit.

@aditi-pandit, @tdcmeehan I think we should have a RFC for this work. What do you think? The RFC can be simple.

* TPCH connector, with ``tpch.naming=standard`` catalog property.

* TPCDS connector.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a column naming issue for TPCDS similar to TPCH?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware of the column naming issue for TPCH, can you please elaborate?

#pragma once

#include "velox/common/memory/Memory.h"
#include "velox/vector/ComplexVector.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this header? Can we forward declare?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Should be possible to forward declare RowVectorPtr.

Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdabre12, @pramodsatya : Have a first round of comments.

// TpcdsTransactionHandle is special since
// the corresponding class in Java is an enum.

namespace facebook::presto::protocol {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add API's to PrestoToVeloxConnector to handle transaction handles as well ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, could this be taken up in a separate PR? (since it affects transaction handles for all connectors)

vector_size_t parallel,
vector_size_t child);

void initializeTable(const std::vector<VectorPtr>& children, int table);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments for each of the methods in this class as this is the final interface seen over dsdgen

@aditi-pandit
Copy link
Contributor

@pdabre12 Some comments. I think we can merge the external portion first in a separate commit as they are copied externally. That makes it easy to review the new bits. I see that it is already a separate commit.

@aditi-pandit, @tdcmeehan I think we should have a RFC for this work. What do you think? The RFC can be simple.

@pdabre12 @majetideepak : Small RFC is okay. Let's bring this to Native worker working group next session so that it gets immediate attention from others.

steveburnett
steveburnett previously approved these changes Sep 24, 2024
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local docs build.

Co-authored-by: Pramod Satya <pramod.satya@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Native] Add Prestissimo TPC-DS connector
6 participants