Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32292][SPARK-32252][INFRA] Run the relevant tests only in GitHub Actions #29086

Closed
wants to merge 1 commit into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jul 13, 2020

What changes were proposed in this pull request?

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues very closely related together while I am here.

  1. The main idea here is: It reuses the existing logic embedded in dev/run-tests.py which Jenkins PR builder use in order to run only the related test cases.

  2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via checkout@v2. With fetch-depth: 0, the history is available.

  3. In addition, it fixes the dev/run-tests.py to match with python/run-tests.py in terms of its options. Environment variables such as TEST_ONLY_XXX were moved as proper options. For example,

    dev/run-tests.py --modules sql,core

    which is consistent with python/run-tests.py, for example,

    python/run-tests.py --modules pyspark-core,pyspark-ml
  4. Lastly, also fixed the formatting issue in module specification in the matrix:

    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,

    which incorrectly runs build/test the modules.

Why are the changes needed?

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of dev/run-tests.py properly, the usages are similar between dev/run-tests.py and python/run-tests.py, and run network-common, network-shuffle, launcher and examples modules too.

Does this PR introduce any user-facing change?

No, dev-only.

How was this patch tested?

Manually tested in my own forked Spark:

HyukjinKwon#7
HyukjinKwon#8
HyukjinKwon#9
HyukjinKwon#10
HyukjinKwon#11
HyukjinKwon#12

# If we're running the tests in Github Actions, attempt to detect and test
# only the affected modules.
if test_env == "github_actions":
if os.environ["GITHUB_BASE_REF"] != "":
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GITHUB_BASE_REF is only set in the PR builder; otherwise, it's an empty string.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean Jenkins PR Builder?

if os.environ["GITHUB_BASE_REF"] != "":
# Pull requests
changed_files = identify_changed_files_from_git_commits(
os.environ["GITHUB_SHA"], target_branch=os.environ["GITHUB_BASE_REF"])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GITHUB_SHA is the commit in PR. GITHUB_BASE_REF is the target repo's branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For many commits in a PR, each time changed_files is different? If a previous commit changed two modules but later one changes only one, will this only run test for the later module?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my testing, it combines all commits and changed_files is based on all commits always.

Seems like GITHUB_SHA became a merge commit internally in Github Actions, and GITHUB_BASE_REF becomes the branch which PR opens against.

For example, if a PR has commits 1, 2 and 3, GitHub Actions looks creates a merge commit 4 for commits 1, 2 and 3. And, it calculates the diff from the branch, for example, master in the original repo, Apache Spark.

git diff 4 master

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of the merge commit: HyukjinKwon@8f36ec4 at HyukjinKwon#7

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Thanks for clarifying. Looks good.

@SparkQA
Copy link

SparkQA commented Jul 13, 2020

Test build #125769 has finished for PR 29086 at commit 9d8555a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -29,7 +29,7 @@ jobs:
modules:
- |-
core, unsafe, kvstore, avro,
network_common, network_shuffle, repl, launcher
network-common, network-shuffle, repl, launcher,
Copy link
Member

@dongjoon-hyun dongjoon-hyun Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I missed this.


# If we're running the tests in Github Actions, attempt to detect and test
# only the affected modules.
if test_env == "github_actions":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This looks safe.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thanks, @HyukjinKwon .
Merged to master.

@HyukjinKwon
Copy link
Member Author

Thank you @dongjoon-hyun!

@HyukjinKwon HyukjinKwon deleted the SPARK-32292 branch July 27, 2020 07:43
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…ub Actions

This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.

In addition, this PR also fixes 3 more issues  very closely related together while I am here.

1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.

2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.

3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,

    ```bash
    dev/run-tests.py --modules sql,core
    ```

    which is consistent with `python/run-tests.py`, for example,

    ```bash
    python/run-tests.py --modules pyspark-core,pyspark-ml
    ```

4. Lastly, also fixed the formatting issue in module specification in the matrix:

    ```diff
    -            network_common, network_shuffle, repl, launcher
    +            network-common, network-shuffle, repl, launcher,
    ```

    which incorrectly runs build/test the modules.

By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.

No, dev-only.

Manually tested in my own forked Spark:

#7
#8
#9
#10
#11
#12

Closes apache#29086 from HyukjinKwon/SPARK-32292.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants