From e7b3df28e31774b4213151d60fae3b3f667720dd Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Sat, 13 Jul 2024 10:04:11 -0600 Subject: [PATCH 1/5] replace arrow-ballista with datafusion-ballista --- docs/README.md | 14 +---- docs/source/community/communication.md | 59 +------------------ .../source/contributors-guide/architecture.md | 8 +-- .../contributors-guide/code-organization.md | 36 +++++------ .../user-guide/deployment/docker-compose.md | 22 +++---- docs/source/user-guide/deployment/docker.md | 32 +++++----- .../user-guide/deployment/kubernetes.md | 30 +++++----- docs/source/user-guide/flightsql.md | 2 +- docs/source/user-guide/python.md | 2 +- docs/source/user-guide/scheduler.md | 2 +- 10 files changed, 72 insertions(+), 135 deletions(-) diff --git a/docs/README.md b/docs/README.md index e4ea30e53..4b68679a7 100644 --- a/docs/README.md +++ b/docs/README.md @@ -47,16 +47,6 @@ inside a Python virtualenv. ## Release -The documentation is served through the [arrow-site](https://github.com/apache/arrow-site/) repository. To release -a new version of the documentation, follow these steps: +The documentation is served through the [datafusion-site](https://github.com/apache/datafusion-site/) repository. -1. Download the release source tarball (we can only publish documentation from official releases) -2. Run `./build.sh` inside `docs` folder to generate the docs website inside the `build/html` folder. -3. Clone the arrow-site repo -4. Checkout to the `asf-site` branch (NOT `master`) -5. Copy build artifacts into `arrow-site` repo's `ballista` folder with a command such as - -- `cp -rT ./build/html/ ../../arrow-site/ballista/` (doesn't work on mac) -- `rsync -avzr ./build/html/ ../../arrow-site/ballista/` - -6. Commit changes in `arrow-site` and send a PR. +Documentation is published automatically when documentation changes are pushed to the main branch. \ No newline at end of file diff --git a/docs/source/community/communication.md b/docs/source/community/communication.md index e13491920..01f1253ef 100644 --- a/docs/source/community/communication.md +++ b/docs/source/community/communication.md @@ -22,63 +22,10 @@ We welcome participation from everyone and encourage you to join us, ask questions, and get involved. -All participation in the Apache Arrow Ballista project is governed by the +All participation in the Apache DataFusion project is governed by the Apache Software Foundation's [code of conduct](https://www.apache.org/foundation/policies/conduct.html). -## Questions? +We use the same communication channels as the main DataFusion project: -### Mailing list - -We use datafusion.apache.org's `dev@` mailing list for project management, release -coorindation and design discussions -([subscribe](mailto:dev-subscribe@datafusion.apache.org), -[unsubscribe](mailto:dev-unsubscribe@datafusion.apache.org), -[archives](https://lists.apache.org/list.html?dev@datafusion.apache.org)). - -When emailing the dev list, please make sure to prefix the subject line with a -`[Ballista]` tag, e.g. `"[Ballista] New API for remote data sources"`, so -that the appropriate people in the Apache Arrow community notice the message. - -### Slack and Discord - -We use the official [ASF](https://s.apache.org/slack-invite) Slack workspace -for informal discussions and coordination. This is a great place to meet other -contributors and get guidance on where to contribute. Join us in the -`#arrow-rust` channel. - -We also have a backup Arrow Rust Discord -server ([invite link](https://discord.gg/Qw5gKqHxUM)) in case you are not able -to join the Slack workspace. If you need an invite to the Slack workspace, you -can also ask for one in our Discord server. - -### Sync up video calls - -We have biweekly sync calls every other Thursdays at both 04:00 UTC -and 16:00 UTC (starting September 30, 2021) depending on if there are -items on the agenda to discuss and someone being willing to host. - -Please see the [agenda](https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit) -for the video call link, add topics and to see what others plan to discuss. - -The goals of these calls are: - -1. Help "put a face to the name" of some of other contributors we are working with -2. Discuss / synchronize on the goals and major initiatives from different stakeholders to identify areas where more alignment is needed - -No decisions are made on the call and anything of substance will be discussed on this mailing list or in github issues / google docs. - -We will send a summary of all sync ups to the dev@datafusion.apache.org mailing list. - -## Contributing - -Our source code is hosted on -[GitHub](https://github.com/apache/arrow-datafusion). More information on contributing is in -the [Contribution Guide](https://github.com/apache/arrow-datafusion/blob/master/CONTRIBUTING.md) -, and we have curated a [good-first-issue] -(https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) -list to help you get started. You can find datafusion's major designs in docs/source/specification. - -We use GitHub issues for maintaining a queue of development work and as the -public record. We often use Google docs, Github issues and pull requests for -quick and small design discussions. For major design change proposals, we encourage you to write a rfc. +[https://datafusion.apache.org/contributor-guide/communication.html](https://datafusion.apache.org/contributor-guide/communication.html) \ No newline at end of file diff --git a/docs/source/contributors-guide/architecture.md b/docs/source/contributors-guide/architecture.md index 4541e4d0a..6cec186fd 100644 --- a/docs/source/contributors-guide/architecture.md +++ b/docs/source/contributors-guide/architecture.md @@ -94,9 +94,9 @@ can execute multiple partitions of the same plan in parallel. There are multiple clients available for submitting jobs to a Ballista cluster: -- The [Ballista CLI](https://github.com/apache/arrow-ballista/tree/main/ballista-cli) provides a SQL command-line +- The [Ballista CLI](https://github.com/apache/datafusion-ballista/tree/main/ballista-cli) provides a SQL command-line interface. -- The Python bindings ([PyBallista](https://github.com/apache/arrow-ballista/tree/main/python)) provide a session +- The Python bindings ([PyBallista](https://github.com/apache/datafusion-ballista/tree/main/python)) provide a session context with support for SQL and DataFrame operations. - The [ballista crate](https://crates.io/crates/ballista) provides a native Rust session context with support for SQL and DataFrame operations. @@ -201,5 +201,5 @@ Each executor will re-partition the output of the stage it is running so that it stage. This mechanism is known as an Exchange or a Shuffle. The logic for this can be found in the [ShuffleWriterExec] and [ShuffleReaderExec] operators. -[shufflewriterexec]: https://github.com/apache/arrow-ballista/blob/main/ballista/core/src/execution_plans/shuffle_writer.rs -[shufflereaderexec]: https://github.com/apache/arrow-ballista/blob/main/ballista/core/src/execution_plans/shuffle_reader.rs +[shufflewriterexec]: https://github.com/apache/datafusion-ballista/blob/main/ballista/core/src/execution_plans/shuffle_writer.rs +[shufflereaderexec]: https://github.com/apache/datafusion-ballista/blob/main/ballista/core/src/execution_plans/shuffle_reader.rs diff --git a/docs/source/contributors-guide/code-organization.md b/docs/source/contributors-guide/code-organization.md index 6b830589d..e1f3e4706 100644 --- a/docs/source/contributors-guide/code-organization.md +++ b/docs/source/contributors-guide/code-organization.md @@ -23,33 +23,33 @@ This section provides links to the source code for major areas of functionality. ### ballista-core crate -- [Crate Source](https://github.com/apache/arrow-ballista/blob/main/ballista/core) -- [Protocol Buffer Definition](https://github.com/apache/arrow-ballista/blob/main/ballista/core/proto/ballista.proto) -- [Execution Plans](https://github.com/apache/arrow-ballista/tree/main/ballista/core/src/execution_plans) -- [Ballista Client](https://github.com/apache/arrow-ballista/blob/main/ballista/core/src/client.rs) +- [Crate Source](https://github.com/apache/datafusion-ballista/blob/main/ballista/core) +- [Protocol Buffer Definition](https://github.com/apache/datafusion-ballista/blob/main/ballista/core/proto/ballista.proto) +- [Execution Plans](https://github.com/apache/datafusion-ballista/tree/main/ballista/core/src/execution_plans) +- [Ballista Client](https://github.com/apache/datafusion-ballista/blob/main/ballista/core/src/client.rs) ### ballista-scheduler crate -- [Crate Source](https://github.com/apache/arrow-ballista/tree/main/ballista/scheduler) -- [Distributed Query Planner](https://github.com/apache/arrow-ballista/blob/main/ballista/scheduler/src/planner.rs) -- [gRPC Service](https://github.com/apache/arrow-ballista/blob/main/ballista/scheduler/src/scheduler_server/grpc.rs) -- [Flight SQL Service](https://github.com/apache/arrow-ballista/blob/main/ballista/scheduler/src/flight_sql.rs) -- [REST API](https://github.com/apache/arrow-ballista/tree/main/ballista/scheduler/src/api) -- [Web UI](https://github.com/apache/arrow-ballista/tree/main/ballista/scheduler/ui) -- [Prometheus Integration](https://github.com/apache/arrow-ballista/blob/main/ballista/scheduler/src/metrics/prometheus.rs) +- [Crate Source](https://github.com/apache/datafusion-ballista/tree/main/ballista/scheduler) +- [Distributed Query Planner](https://github.com/apache/datafusion-ballista/blob/main/ballista/scheduler/src/planner.rs) +- [gRPC Service](https://github.com/apache/datafusion-ballista/blob/main/ballista/scheduler/src/scheduler_server/grpc.rs) +- [Flight SQL Service](https://github.com/apache/datafusion-ballista/blob/main/ballista/scheduler/src/flight_sql.rs) +- [REST API](https://github.com/apache/datafusion-ballista/tree/main/ballista/scheduler/src/api) +- [Web UI](https://github.com/apache/datafusion-ballista/tree/main/ballista/scheduler/ui) +- [Prometheus Integration](https://github.com/apache/datafusion-ballista/blob/main/ballista/scheduler/src/metrics/prometheus.rs) ### ballista-executor crate -- [Crate Source](https://github.com/apache/arrow-ballista/tree/main/ballista/executor) -- [Flight Service](https://github.com/apache/arrow-ballista/blob/main/ballista/executor/src/flight_service.rs) -- [Executor Server](https://github.com/apache/arrow-ballista/blob/main/ballista/executor/src/executor_server.rs) +- [Crate Source](https://github.com/apache/datafusion-ballista/tree/main/ballista/executor) +- [Flight Service](https://github.com/apache/datafusion-ballista/blob/main/ballista/executor/src/flight_service.rs) +- [Executor Server](https://github.com/apache/datafusion-ballista/blob/main/ballista/executor/src/executor_server.rs) ### ballista crate -- [Crate Source](https://github.com/apache/arrow-ballista/tree/main/ballista/client) -- [Context](https://github.com/apache/arrow-ballista/blob/main/ballista/client/src/context.rs) +- [Crate Source](https://github.com/apache/datafusion-ballista/tree/main/ballista/client) +- [Context](https://github.com/apache/datafusion-ballista/blob/main/ballista/client/src/context.rs) ### PyBallista -- [Source](https://github.com/apache/arrow-ballista/tree/main/python) -- [Context](https://github.com/apache/arrow-ballista/blob/main/python/src/context.rs) +- [Source](https://github.com/apache/datafusion-ballista/tree/main/python) +- [Context](https://github.com/apache/datafusion-ballista/blob/main/python/src/context.rs) diff --git a/docs/source/user-guide/deployment/docker-compose.md b/docs/source/user-guide/deployment/docker-compose.md index f09490f9c..53501c781 100644 --- a/docs/source/user-guide/deployment/docker-compose.md +++ b/docs/source/user-guide/deployment/docker-compose.md @@ -23,31 +23,31 @@ Docker Compose is a convenient way to launch a cluster when testing locally. ## Build Docker Images -Run the following commands to download the [official Docker image](https://github.com/apache/arrow-ballista/pkgs/container/arrow-ballista-standalone): +Run the following commands to download the [official Docker image](https://github.com/apache/datafusion-ballista/pkgs/container/datafusion-ballista-standalone): ```bash -docker pull ghcr.io/apache/arrow-ballista-standalone:0.12.0-rc4 +docker pull ghcr.io/apache/datafusion-ballista-standalone:0.12.0-rc4 ``` Altenatively run the following commands to clone the source repository and build the Docker images from source: ```bash -git clone git@github.com:apache/arrow-ballista.git -b 0.12.0 -cd arrow-ballista +git clone git@github.com:apache/datafusion-ballista.git -b 0.12.0 +cd datafusion-ballista ./dev/build-ballista-docker.sh ``` This will create the following images: -- `apache/arrow-ballista-benchmarks:0.12.0` -- `apache/arrow-ballista-cli:0.12.0` -- `apache/arrow-ballista-executor:0.12.0` -- `apache/arrow-ballista-scheduler:0.12.0` -- `apache/arrow-ballista-standalone:0.12.0` +- `apache/datafusion-ballista-benchmarks:0.12.0` +- `apache/datafusion-ballista-cli:0.12.0` +- `apache/datafusion-ballista-executor:0.12.0` +- `apache/datafusion-ballista-scheduler:0.12.0` +- `apache/datafusion-ballista-standalone:0.12.0` ## Start a Cluster -Using the [docker-compose.yml](https://github.com/apache/arrow-ballista/blob/main/docker-compose.yml) from the +Using the [docker-compose.yml](https://github.com/apache/datafusion-ballista/blob/main/docker-compose.yml) from the source repository, run the following command to start a cluster: ```bash @@ -77,5 +77,5 @@ The scheduler web UI is available on port 80 in the scheduler. ## Connect from the Ballista CLI ```shell -docker run --network=host -it apache/arrow-ballista-cli:0.12.0 --host localhost --port 50050 +docker run --network=host -it apache/datafusion-ballista-cli:0.12.0 --host localhost --port 50050 ``` diff --git a/docs/source/user-guide/deployment/docker.md b/docs/source/user-guide/deployment/docker.md index 291c98e3f..b67e267e0 100644 --- a/docs/source/user-guide/deployment/docker.md +++ b/docs/source/user-guide/deployment/docker.md @@ -21,27 +21,27 @@ ## Build Docker Images -Run the following commands to download the [official Docker image](https://github.com/apache/arrow-ballista/pkgs/container/arrow-ballista-standalone): +Run the following commands to download the [official Docker image](https://github.com/apache/datafusion-ballista/pkgs/container/datafusion-ballista-standalone): ```bash -docker pull ghcr.io/apache/arrow-ballista-standalone:0.12.0-rc4 +docker pull ghcr.io/apache/datafusion-ballista-standalone:0.12.0-rc4 ``` Altenatively run the following commands to clone the source repository and build the Docker images from source: ```bash -git clone git@github.com:apache/arrow-ballista.git -b 0.12.0 -cd arrow-ballista +git clone git@github.com:apache/datafusion-ballista.git -b 0.12.0 +cd datafusion-ballista ./dev/build-ballista-docker.sh ``` This will create the following images: -- `apache/arrow-ballista-benchmarks:0.12.0` -- `apache/arrow-ballista-cli:0.12.0` -- `apache/arrow-ballista-executor:0.12.0` -- `apache/arrow-ballista-scheduler:0.12.0` -- `apache/arrow-ballista-standalone:0.12.0` +- `apache/datafusion-ballista-benchmarks:0.12.0` +- `apache/datafusion-ballista-cli:0.12.0` +- `apache/datafusion-ballista-executor:0.12.0` +- `apache/datafusion-ballista-scheduler:0.12.0` +- `apache/datafusion-ballista-standalone:0.12.0` ## Start a Cluster @@ -51,7 +51,7 @@ Start a scheduler using the following syntax: ```bash docker run --network=host \ - -d apache/arrow-ballista-scheduler:0.12.0 \ + -d apache/datafusion-ballista-scheduler:0.12.0 \ --bind-port 50050 ``` @@ -60,7 +60,7 @@ Run `docker ps` to check that the process is running: ``` $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -a756055576f3 apache/arrow-ballista-scheduler:0.12.0 "/root/scheduler-ent…" 8 seconds ago Up 8 seconds xenodochial_carson +a756055576f3 apache/datafusion-ballista-scheduler:0.12.0 "/root/scheduler-ent…" 8 seconds ago Up 8 seconds xenodochial_carson ``` Run `docker logs CONTAINER_ID` to check the output from the process: @@ -84,7 +84,7 @@ Start one or more executor processes. Each executor process will need to listen ```bash docker run --network=host \ - -d apache/arrow-ballista-executor:0.12.0 \ + -d apache/datafusion-ballista-executor:0.12.0 \ --external-host localhost --bind-port 50051 ``` @@ -93,8 +93,8 @@ Use `docker ps` to check that both the scheduler and executor(s) are now running ``` $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -fb8b530cee6d apache/arrow-ballista-executor:0.12.0 "/root/executor-entr…" 2 seconds ago Up 1 second gallant_galois -a756055576f3 apache/arrow-ballista-scheduler:0.12.0 "/root/scheduler-ent…" 8 seconds ago Up 8 seconds xenodochial_carson +fb8b530cee6d apache/datafusion-ballista-executor:0.12.0 "/root/executor-entr…" 2 seconds ago Up 1 second gallant_galois +a756055576f3 apache/datafusion-ballista-scheduler:0.12.0 "/root/scheduler-ent…" 8 seconds ago Up 8 seconds xenodochial_carson ``` Use `docker logs CONTAINER_ID` to check the output from the executor(s): @@ -117,7 +117,7 @@ to launch the scheduler with this option enabled. ```bash docker run --network=host \ - -d apache/arrow-ballista-scheduler:0.12.0 \ + -d apache/datafusion-ballista-scheduler:0.12.0 \ --bind-port 50050 \ --config-backend etcd \ --etcd-urls etcd:2379 @@ -129,5 +129,5 @@ recommended. ## Connect from the CLI ```shell -docker run --network=host -it apache/arrow-ballista-cli:0.12.0 --host localhost --port 50050 +docker run --network=host -it apache/datafusion-ballista-cli:0.12.0 --host localhost --port 50050 ``` diff --git a/docs/source/user-guide/deployment/kubernetes.md b/docs/source/user-guide/deployment/kubernetes.md index eebe6e1c9..2bdb4fb69 100644 --- a/docs/source/user-guide/deployment/kubernetes.md +++ b/docs/source/user-guide/deployment/kubernetes.md @@ -41,37 +41,37 @@ microk8s enable dns ## Build Docker Images -Run the following commands to download the [official Docker image](https://github.com/apache/arrow-ballista/pkgs/container/arrow-ballista-standalone): +Run the following commands to download the [official Docker image](https://github.com/apache/datafusion-ballista/pkgs/container/datafusion-ballista-standalone): ```bash -docker pull ghcr.io/apache/arrow-ballista-standalone:0.12.0-rc4 +docker pull ghcr.io/apache/datafusion-ballista-standalone:0.12.0-rc4 ``` Altenatively run the following commands to clone the source repository and build the Docker images from source: ```bash -git clone git@github.com:apache/arrow-ballista.git -b 0.12.0 -cd arrow-ballista +git clone git@github.com:apache/datafusion-ballista.git -b 0.12.0 +cd datafusion-ballista ./dev/build-ballista-docker.sh ``` This will create the following images: -- `apache/arrow-ballista-benchmarks:0.12.0` -- `apache/arrow-ballista-cli:0.12.0` -- `apache/arrow-ballista-executor:0.12.0` -- `apache/arrow-ballista-scheduler:0.12.0` -- `apache/arrow-ballista-standalone:0.12.0` +- `apache/datafusion-ballista-benchmarks:0.12.0` +- `apache/datafusion-ballista-cli:0.12.0` +- `apache/datafusion-ballista-executor:0.12.0` +- `apache/datafusion-ballista-scheduler:0.12.0` +- `apache/datafusion-ballista-standalone:0.12.0` ## Publishing Docker Images Once the images have been built, you can retag them and can push them to your favourite Docker registry. ```bash -docker tag apache/arrow-ballista-scheduler:0.12.0 /arrow-ballista-scheduler:0.12.0 -docker tag apache/arrow-ballista-executor:0.12.0 /arrow-ballista-executor:0.12.0 -docker push /arrow-ballista-scheduler:0.12.0 -docker push /arrow-ballista-executor:0.12.0 +docker tag apache/datafusion-ballista-scheduler:0.12.0 /datafusion-ballista-scheduler:0.12.0 +docker tag apache/datafusion-ballista-executor:0.12.0 /datafusion-ballista-executor:0.12.0 +docker push /datafusion-ballista-scheduler:0.12.0 +docker push /datafusion-ballista-executor:0.12.0 ``` ## Create Persistent Volume and Persistent Volume Claim @@ -159,7 +159,7 @@ spec: spec: containers: - name: ballista-scheduler - image: /arrow-ballista-scheduler:0.12.0 + image: /datafusion-ballista-scheduler:0.12.0 args: ["--bind-port=50050"] ports: - containerPort: 50050 @@ -191,7 +191,7 @@ spec: spec: containers: - name: ballista-executor - image: /arrow-ballista-executor:0.12.0 + image: /datafusion-ballista-executor:0.12.0 args: - "--bind-port=50051" - "--scheduler-host=ballista-scheduler" diff --git a/docs/source/user-guide/flightsql.md b/docs/source/user-guide/flightsql.md index cb420e3de..a59f8d2a8 100644 --- a/docs/source/user-guide/flightsql.md +++ b/docs/source/user-guide/flightsql.md @@ -54,7 +54,7 @@ choco install docker-desktop ## Run Docker Container ```shell -docker run -p 50050:50050 --rm ghcr.io/apache/arrow-ballista-standalone:0.10.0 +docker run -p 50050:50050 --rm ghcr.io/apache/datafusion-ballista-standalone:0.10.0 ``` ## Download the FlightSQL JDBC Driver diff --git a/docs/source/user-guide/python.md b/docs/source/user-guide/python.md index 80ce8aa5d..674850c70 100644 --- a/docs/source/user-guide/python.md +++ b/docs/source/user-guide/python.md @@ -135,4 +135,4 @@ assert result.column(1) == pyarrow.array([-3, -3, -3]) ## User Defined Functions The underlying DataFusion query engine supports Python UDFs but this functionality has not yet been implemented in -Ballista. It is planned for a future release. The tracking issue is [#173](https://github.com/apache/arrow-ballista/issues/173). +Ballista. It is planned for a future release. The tracking issue is [#173](https://github.com/apache/datafusion-ballista/issues/173). diff --git a/docs/source/user-guide/scheduler.md b/docs/source/user-guide/scheduler.md index 6ac81ed2c..447992cea 100644 --- a/docs/source/user-guide/scheduler.md +++ b/docs/source/user-guide/scheduler.md @@ -21,7 +21,7 @@ ## Web User Interface -The scheduler provides a web user interface that allows queries to be monitored. Details on how to start the ui is present [here](https://github.com/apache/arrow-ballista/tree/main/ballista/scheduler/ui) +The scheduler provides a web user interface that allows queries to be monitored. Details on how to start the ui is present [here](https://github.com/apache/datafusion-ballista/tree/main/ballista/scheduler/ui) ![Ballista Scheduler Web UI](./images/ballista-web-ui.png) From 42e5b949f05d3676b0b6ed49081937ed770c256f Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Sat, 13 Jul 2024 10:06:10 -0600 Subject: [PATCH 2/5] update more references to Arrow --- .asf.yaml | 5 ++ .github/workflows/docs.yaml | 81 +++++++++++++++++++ docs/source/_templates/layout.html | 2 +- docs/source/community/communication.md | 4 +- docs/source/conf.py | 2 +- docs/source/contributors-guide/development.md | 2 +- docs/source/index.rst | 2 +- docs/source/user-guide/faq.md | 5 +- docs/source/user-guide/flightsql.md | 2 +- docs/source/user-guide/introduction.md | 2 +- 10 files changed, 95 insertions(+), 12 deletions(-) create mode 100644 .github/workflows/docs.yaml diff --git a/.asf.yaml b/.asf.yaml index aebf957d6..dae11d120 100644 --- a/.asf.yaml +++ b/.asf.yaml @@ -39,3 +39,8 @@ github: rebase: false features: issues: true +# publishes the content of the `asf-site` branch to +# https://datafusion.apache.org/ballista/ +publish: + whoami: asf-site + subdir: ballista \ No newline at end of file diff --git a/.github/workflows/docs.yaml b/.github/workflows/docs.yaml new file mode 100644 index 000000000..257224f1e --- /dev/null +++ b/.github/workflows/docs.yaml @@ -0,0 +1,81 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +on: + push: + branches: + - main + paths: + - .asf.yaml + - .github/workflows/docs.yaml + - docs/** + +name: Deploy DataFusion Ballista site + +jobs: + build-docs: + name: Build docs + runs-on: ubuntu-latest + steps: + - name: Checkout docs sources + uses: actions/checkout@v4 + + - name: Checkout asf-site branch + uses: actions/checkout@v4 + with: + ref: asf-site + path: asf-site + + - name: Setup Python + uses: actions/setup-python@v5 + with: + python-version: "3.10" + + - name: Install dependencies + run: | + set -x + python3 -m venv venv + source venv/bin/activate + pip install -r docs/requirements.txt + + - name: Build docs + run: | + set -x + source venv/bin/activate + cd docs + ./build.sh + + - name: Copy & push the generated HTML + run: | + set -x + cd asf-site/ + rsync \ + -a \ + --delete \ + --exclude '/.git/' \ + ../docs/build/html/ \ + ./ + cp ../.asf.yaml . + touch .nojekyll + git status --porcelain + if [ "$(git status --porcelain)" != "" ]; then + git config user.name "github-actions[bot]" + git config user.email "github-actions[bot]@users.noreply.github.com" + git add --all + git commit -m 'Publish built docs triggered by ${{ github.sha }}' + git push || git push --force + fi \ No newline at end of file diff --git a/docs/source/_templates/layout.html b/docs/source/_templates/layout.html index 101a49ab0..921dffaae 100644 --- a/docs/source/_templates/layout.html +++ b/docs/source/_templates/layout.html @@ -17,7 +17,7 @@ {% endfor %} diff --git a/docs/source/community/communication.md b/docs/source/community/communication.md index 01f1253ef..987c8e824 100644 --- a/docs/source/community/communication.md +++ b/docs/source/community/communication.md @@ -22,10 +22,10 @@ We welcome participation from everyone and encourage you to join us, ask questions, and get involved. -All participation in the Apache DataFusion project is governed by the +All participation in the Apache DataFusion Ballista project is governed by the Apache Software Foundation's [code of conduct](https://www.apache.org/foundation/policies/conduct.html). We use the same communication channels as the main DataFusion project: -[https://datafusion.apache.org/contributor-guide/communication.html](https://datafusion.apache.org/contributor-guide/communication.html) \ No newline at end of file +[https://datafusion.apache.org/contributor-guide/communication.html](https://datafusion.apache.org/contributor-guide/communication.html) diff --git a/docs/source/conf.py b/docs/source/conf.py index ae2f03ebf..7a3477f80 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -33,7 +33,7 @@ # -- Project information ----------------------------------------------------- -project = 'Apache Arrow Ballista' +project = 'Apache DataFusion Ballista' copyright = '2019-2024, Apache Software Foundation' author = 'Apache Software Foundation' diff --git a/docs/source/contributors-guide/development.md b/docs/source/contributors-guide/development.md index e5bb1e4b1..a21595b10 100644 --- a/docs/source/contributors-guide/development.md +++ b/docs/source/contributors-guide/development.md @@ -22,7 +22,7 @@ We welcome participation from everyone and encourage you to join us, ask questions, and get involved. -All participation in the Apache Arrow Ballista project is governed by the +All participation in the Apache DataFusion Ballista project is governed by the Apache Software Foundation's [code of conduct](https://www.apache.org/foundation/policies/conduct.html). diff --git a/docs/source/index.rst b/docs/source/index.rst index 60e810b32..9491eccf5 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -16,7 +16,7 @@ .. under the License. ===================== -Apache Arrow Ballista +Apache DataFusion Ballista ===================== Table of content diff --git a/docs/source/user-guide/faq.md b/docs/source/user-guide/faq.md index f96306c1c..5cfd9fe8b 100644 --- a/docs/source/user-guide/faq.md +++ b/docs/source/user-guide/faq.md @@ -19,10 +19,7 @@ # Frequently Asked Questions -## What is the relationship between Apache Arrow, DataFusion, and Ballista? - -Apache Arrow is a library which provides a standardized memory representation for columnar data. It also provides -"kernels" for performing common operations on this data. +## What is the relationship between DataFusion and Ballista? DataFusion is a library for executing queries in-process using the Apache Arrow memory model and computational kernels. It is designed to run within a single process, using threads diff --git a/docs/source/user-guide/flightsql.md b/docs/source/user-guide/flightsql.md index a59f8d2a8..4572eef91 100644 --- a/docs/source/user-guide/flightsql.md +++ b/docs/source/user-guide/flightsql.md @@ -79,7 +79,7 @@ The important pieces of information: ## Run a "Hello, World!" Query ```sql -select 'Hello from Arrow Ballista!' as greeting; +select 'Hello from DataFusion Ballista!' as greeting; ``` ## Run a Complex Query diff --git a/docs/source/user-guide/introduction.md b/docs/source/user-guide/introduction.md index 65cbe2f7c..fbadf13b5 100644 --- a/docs/source/user-guide/introduction.md +++ b/docs/source/user-guide/introduction.md @@ -19,7 +19,7 @@ # Overview -Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow. +Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache DataFusion. Ballista has a scheduler and an executor process that are standard Rust executables and can be executed directly, but Dockerfiles are provided to build images for use in containerized environments, such as Docker, Docker Compose, and From eaffdae6f5c6d7ab62e7fd350e316fa9ed6a1a55 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Sat, 13 Jul 2024 10:09:08 -0600 Subject: [PATCH 3/5] fix info on docs publishing --- docs/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/README.md b/docs/README.md index 4b68679a7..7db82838f 100644 --- a/docs/README.md +++ b/docs/README.md @@ -47,6 +47,6 @@ inside a Python virtualenv. ## Release -The documentation is served through the [datafusion-site](https://github.com/apache/datafusion-site/) repository. +The documentation is published from the `asf-site` branch of this repository. Documentation is published automatically when documentation changes are pushed to the main branch. \ No newline at end of file From 27f89958a804158c35d7bf9345e6e8cddca8c284 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Sat, 13 Jul 2024 10:11:39 -0600 Subject: [PATCH 4/5] update more references --- docs/source/conf.py | 2 +- docs/source/index.rst | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/source/conf.py b/docs/source/conf.py index 7a3477f80..eab94b1a9 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -90,7 +90,7 @@ html_context = { "github_user": "apache", - "github_repo": "arrow-ballista", + "github_repo": "datafusion-ballista", "github_version": "main", "doc_path": "docs/source", } diff --git a/docs/source/index.rst b/docs/source/index.rst index 9491eccf5..959d5844b 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -65,7 +65,7 @@ Table of content contributors-guide/architecture contributors-guide/code-organization contributors-guide/development - Source code + Source code .. _toc.community: @@ -75,5 +75,5 @@ Table of content community/communication - Issue tracker - Code of conduct + Issue tracker + Code of conduct From e34b4c45f03cfe28ba96a421e0ec900768552828 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Sat, 13 Jul 2024 10:15:20 -0600 Subject: [PATCH 5/5] more updates --- docs/source/community/communication.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/community/communication.md b/docs/source/community/communication.md index c9892f320..295bf46fe 100644 --- a/docs/source/community/communication.md +++ b/docs/source/community/communication.md @@ -34,7 +34,7 @@ We use the same communication channels as the main DataFusion project: Our source code is hosted on [GitHub](https://github.com/apache/datafusion-ballista). More information on contributing is in -the [Contribution Guide](https://github.com/apache/datafusion-ballista/blob/master/CONTRIBUTING.md) +the [Contribution Guide](https://github.com/apache/datafusion-ballista/blob/main/CONTRIBUTING.md) , and we have curated a [good-first-issue](https://github.com/apache/datafusion-ballista/contribute) list to help you get started. You can find datafusion's major designs in docs/source/specification.