Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VELOX_BUILD_MINIMAL_WITH_DWIO compilation option #8682

Closed
wants to merge 1 commit into from

Conversation

pedroerp
Copy link
Contributor

@pedroerp pedroerp commented Feb 6, 2024

VELOX_BUILD_MINIMAL_WITH_DWIO allows developers using Velox to compile only dwio (in addition to Velox minimal), but without pulling all other dependencies and internal libraries (exec, connectors, parser, aggregates, storage adapters, etc).

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 6, 2024
Copy link

netlify bot commented Feb 6, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit d682418
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/65cb97d6f011ad0008161e72

Copy link
Contributor

@kgpai kgpai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some nits..

CMakeLists.txt Show resolved Hide resolved
CMakeLists.txt Outdated
set(VELOX_ENABLE_EXAMPLES ON)
endif()

if(${VELOX_BUILD_DWIO})
set(VELOX_ENABLE_DWIO ON)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: What if we have contradictory build flags - say VELOX_ENABLE_HIVE_CONNECTOR ON, but VELOX_ENABLE_DWIO OFF ,should we check for that ?

Copy link
Contributor Author

@pedroerp pedroerp Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgpai Good question. These flags are a bit all over the place today. I can add a check here, but it seems like we should separate the flags we use internally to control which directories we compile, from the external facing flags that users can toggle, since in many cases they can be incompatible.

Something like this, for example, won't work as expected today:

$ EXTRA_CMAKE_FLAGS=" -DVELOX_BUILD_MINIMAL=ON -DVELOX_ENABLE_HIVE_CONNECTOR=ON" make

Since in line 126 we will overwrite the enable hive connector flag to OFF. And even if it does, hive needs exec and the compression libraries, which won't be available.

Point being that users can't really (or shouldn't) tweak VELOX_ENABLE_HIVE_CONNECTOR manually.

Cc: @assignUser @majetideepak

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the check in any case since is likely a longer discussion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, reorganizing the options and also giving some nice output at the beginning of the configure phase etc. is on my list but atm lower priority. Adding CMakePresets.json would also be good (partially replaces makefile, get's picked up by IDEs...).

Copy link
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pedroerp what is the difference between VELOX_BUILD_DWIO and VELOX_ENABLE_DWIO? Can we have just one?

@pedroerp pedroerp force-pushed the velox-dwio branch 2 times, most recently from 2e33a79 to b79e967 Compare February 6, 2024 16:29
$(MAKE) cmake BUILD_DIR=debug BUILD_TYPE=debug EXTRA_CMAKE_FLAGS="${EXTRA_CMAKE_FLAGS} -DVELOX_BUILD_MINIMAL=ON"
$(MAKE) build BUILD_DIR=debug

min_debug: minimal_debug
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for backward compatibility

Copy link
Collaborator

@assignUser assignUser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

CMakeLists.txt Outdated
set(VELOX_ENABLE_EXAMPLES ON)
endif()

if(${VELOX_BUILD_DWIO})
set(VELOX_ENABLE_DWIO ON)
endif()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, reorganizing the options and also giving some nice output at the beginning of the configure phase etc. is on my list but atm lower priority. Adding CMakePresets.json would also be good (partially replaces makefile, get's picked up by IDEs...).

CMakeLists.txt Show resolved Hide resolved
@pedroerp
Copy link
Contributor Author

pedroerp commented Feb 6, 2024

@pedroerp what is the difference between VELOX_BUILD_DWIO and VELOX_ENABLE_DWIO? Can we have just one?

@majetideepak Unfortunately, we can't because of the problem I described here:

https://github.com/facebookincubator/velox/pull/8682/files#r1480059051

@majetideepak
Copy link
Collaborator

majetideepak commented Feb 7, 2024

Unfortunately, we can't because of the problem I described here:

@pedroerp I don't think we need two CMake options. We can have one for external usage. The current pattern is to use VELOX_ENABLE_* (like VELOX_ENABLE_PARQUET). For internal usage, we can use set(). For example, set(VELOX_ENABLE_DUCKDB ON) is defined based on whether testing or test utils are on.
There is no easy way to avoid conflicting options except to disallow them or have checks. In your example, we should ideally remove VELOX_ENABLE_HIVE_CONNECTOR option since that cannot exist on its own.
We could set(VELOX_ENABLE_HIVE_CONNECTOR ON) if either of DWRF, Parquet, or storage adapters options are on.

@pedroerp
Copy link
Contributor Author

pedroerp commented Feb 12, 2024

@majetideepak Maybe I'm missing something, but I think that there are two different concepts here:

  1. One wants to enable a "DWIO compilation mode", which basically enables everything needed by DWIO, and disables everything else not needed (storage adapters, examples, benchmarks, tests, etc). This is what I'm trying to achieve.
  2. One just wants to make sure DWIO is enabled (along with its dependencies). This is the VELOX_ENABLE_PARQUET behavior.

The problem is that by default we build a lot of things, so if I just wanted to build enough to support DWIO (1), with a single flag, I would need to manually turn OFF every other flag. Likely I won't get the entire list, and more could be added in the future, adding stuff I don't need to my compilation time.

So with a single flag, what one would naturally try to do is to enable both VELOX_MINIMAL and VELOX_ENABLE_DWIO. This, counter-intuitively, won't do the job as VELOX_MINIMAL will effectively disable DWIO and overwrite the flag the user is enabled trying to enable.

Cc: @assignUser @kgpai

@majetideepak
Copy link
Collaborator

majetideepak commented Feb 13, 2024

@pedroerp In this PR, VELOX_ENABLE_DWIO can simply be a CMake variable set(VELOX_ENABLE_DWIO, ON). It is mainly controlled by other options (VELOX_BUILD_MINIMAL, VELOX_ENABLE_HIVE_CONNECTOR, VELOX_BUILD_DWIO). VELOX_ENABLE_PARQUET is not controlled by any other option.
I also see that the compression libraries will be disabled if VELOX_ENABLE_DWIO = OFF and VELOX_ENABLE_PARQUET=ON which is a problem.
I feel the current solution is adding more to the already broken semantics.
If you want 1) .. enables everything needed by DWIO, and disables everything else ..., let's add VELOX_BUILD_MINIMAL_WITH_DWIO and make it explicit similar to VELOX_BUILD_MINIMAL.

@majetideepak
Copy link
Collaborator

majetideepak commented Feb 13, 2024

There is value in 2) for users who want to disable DWRF and enable only PARQUET. We could add option(VELOX_ENABLE_DWRF "Build dwrf (file format readers/writers)." ON) similar to VELOX_ENABLE_PARQUET. Users will have to disable VELOX_ENABLE_DWRF and enable VELOX_ENABLE_PARQUET.
The compression libraries and adapters will depend on VELOX_ENABLE_HIVE_CONNECTOR which has to tie into these options.

@assignUser
Copy link
Collaborator

Both of your approaches work but as I don't know what the normal use cases would be/which scenarios are more likely. There is a better builtin solution to linking option than just manually set()ing them: https://cmake.org/cmake/help/latest/module/CMakeDependentOption.html but refactoring everything to use this is probably out of scope for this PR?

@pedroerp
Copy link
Contributor Author

I also see that the compression libraries will be disabled if VELOX_ENABLE_DWIO = OFF and VELOX_ENABLE_PARQUET=ON which is a problem.

Parquet is part of dwio, so VELOX_ENABLE_PARQUET should imply VELOX_ENABLE_DWIO. I'll add a check to enforce this.

if you want 1) .. enables everything needed by DWIO, and disables everything else ..., let's add VELOX_BUILD_MINIMAL_WITH_DWIO and make it explicit similar to VELOX_BUILD_MINIMAL.

That was my intent with the VELOX_BUILD_DWIO flag, but we can to rename it to VELOX_BUILD_MINIMAL_WITH_DWIO like you suggested to make it more explicit.

The compression libraries and adapters will depend on VELOX_ENABLE_HIVE_CONNECTOR which has to tie into these options.

Hive connector has a dependency on DWIO (since it needs file readers/writers), but AFAIK no direct dependency on compression libraries. This was only done because hive connector and dwio were bundled together.

@majetideepak

@majetideepak
Copy link
Collaborator

@pedroerp let's remove VELOX_ENABLE_DWIO option since similar to VELOX_ENABLE_HIVE_CONNECTOR it conflicts with other options.
For your need, VELOX_BUILD_MINIMAL_WITH_DWIO should suffice.

@majetideepak
Copy link
Collaborator

Long term as @assignUser suggested, we should use https://cmake.org/cmake/help/latest/module/CMakeDependentOption.html and tighten the semantics. Adding checks is not scalable.

@pedroerp pedroerp force-pushed the velox-dwio branch 2 times, most recently from daea9b2 to f191d9c Compare February 13, 2024 16:15
Copy link
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @pedroerp

CMakeLists.txt Outdated
@@ -467,7 +476,11 @@ else()
set(FOLLY_BENCHMARK Folly::follybenchmark)
endif()

if(NOT ${VELOX_BUILD_MINIMAL})
# WIO (ORC/DWRF), Substrait and experimental/codegen depend on protobuf.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: DWIO

`VELOX_BUILD_MINIMAL_WITH_DWIO` allows developers using Velox
to compile only dwio (in addition to Velox minimal), but without
pulling all other dependencies and internal libraries (exec,
connectors, etc).
@pedroerp pedroerp changed the title Add VELOX_BUILD_DWIO compilation option Add VELOX_BUILD_MINIMAL_WITH_DWIO compilation option Feb 13, 2024
@facebook-github-bot
Copy link
Contributor

@pedroerp has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pedroerp merged this pull request in f0583e7.

Copy link

Conbench analyzed the 1 benchmark run on commit f0583e76.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants