Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28812][SQL][DOC] Document SHOW PARTITIONS in SQL Reference #26635

Closed
wants to merge 3 commits into from

Conversation

dilipbiswal
Copy link
Contributor

What changes were proposed in this pull request?

Document SHOW PARTITIONS statement in SQL Reference Guide.

Why are the changes needed?

Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand the
usage. This is aimed at addressing this issue.

Does this PR introduce any user-facing change?

Yes.

Before
After
image
image
image

@SparkQA
Copy link

SparkQA commented Nov 22, 2019

Test build #114279 has finished for PR 26635 at commit dab073a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

cc @srowen

<dt><code><em>partition_spec</em></code></dt>
<dd>
An optional parameter that specifies a comma separated list of key and value pairs
for partitions. When specified, the partitions that matches the partition spec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: match?

</dd>
</dl>

### Example
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Examples?

@SparkQA
Copy link

SparkQA commented Nov 22, 2019

Test build #114309 has finished for PR 26635 at commit 7e094c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


### Parameters
<dl>
<dt><code><em>table_identifier</em></code></dt>
Copy link
Member

@dongjoon-hyun dongjoon-hyun Nov 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use table_name instead of table_identifier because it's used more?

$ git grep '<em>table_name</em>'
sql-ref-syntax-aux-analyze-table.md:  <dt><code><em>table_name</em></code></dt>
sql-ref-syntax-aux-cache-cache-table.md:  <dt><code><em>table_name</em></code></dt>
sql-ref-syntax-aux-cache-uncache-table.md: <dt><code><em>table_name</em></code></dt>
sql-ref-syntax-ddl-alter-table.md:  <dt><code><em>table_name</em></code></dt>
sql-ref-syntax-ddl-alter-table.md:  <dt><code><em>table_name</em></code></dt>
sql-ref-syntax-ddl-drop-table.md:  <dt><code><em>table_name</em></code></dt>
sql-ref-syntax-ddl-repair-table.md:  <dt><code><em>table_name</em></code></dt>
sql-ref-syntax-ddl-truncate-table.md:  <dt><code><em>table_name</em></code></dt>
sql-ref-syntax-dml-insert-into.md:  <dt><code><em>table_name</em></code></dt>
sql-ref-syntax-dml-insert-overwrite-table.md:  <dt><code><em>table_name</em></code></dt>
sql-ref-syntax-dml-load.md:  <dt><code><em>table_name</em></code></dt>

$ git grep '<em>table_identifier</em>'
sql-ref-syntax-aux-describe-table.md:  <dt><code><em>table_identifier</em></code></dt>
sql-ref-syntax-aux-show-tblproperties.md:  <dt><code><em>table_identifier</em></code></dt>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we used table_name at line 39 of this PR.

Copy link
Contributor Author

@dilipbiswal dilipbiswal Nov 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun For consistency we can. But personally, i would like to tell the users that we accept a table identifier in this command i.e it can be a qualified table.

For example if we look at the following link.
https://docs.databricks.com/spark/latest/spark-sql/language-manual/cache-table.html

We do mention in above in the syntax that the table can be qualified. However, in our doc its not so clear. Thats the reason, i chose to define "table_identifier" as a parameter and had a sub syntax to explain further so there is no ambiguity. Another option is to just have [db_name].table_name in the main syntax block itself. My intention of using a parameter in the main syntax block was to make it easy to read the core syntax.

SHOW PARTITIONS [db_name].table_name 
   [ PARTITION ( partition_col_name = partition_col_val [ , ... ] ) ]

Please let me know what you think and i would change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think so, you are trying to do two things in this single PR.

You had better follow the existing table_name in this PR first. Then, you can create another PR to adjust all of them from the docs. In general, we didn't agree your suggestion is good or bad. In worst case, you know that your suggestion can be rejected or we may choose another 3rd option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun Sure.. I have made the change for this pr.

@SparkQA
Copy link

SparkQA commented Nov 24, 2019

Test build #114332 has finished for PR 26635 at commit 6c95a38.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master. Thank you, @dilipbiswal .
I generated the doc locally and verified.

@dilipbiswal
Copy link
Contributor Author

@dongjoon-hyun Thank you very much.

May i please request you to put some thoughts on how we should standardize on specifying table identifiers and partition specifications in the sql reference doc ?

For table identifiers, we have used a few variations that i see in our doc.

  1. [db_name].table_name in main syntax block.
  2. table_name in the main syntax block and a parameter table_name. In this case, we don't seem to specify that it is an table identifier and the syntax to specify it.
  3. table_identifier in main syntax block and then a sub-syntax in the parameter definition.
  4. tableIdentifier in main syntax block and then a sub-syntax in the parameter definition.

We can pick one from above 4 or pick a new way as you mentioned.

For partitioning spec we have used the following variations.

  1. [ PARTITION ( partition_col_name = partition_col_val [ , ... ] ) ] in main syntax block and
    define the same as a parameter as well.
  2. [PARTITION(partition_spec)] in main syntax block and define the same as a parameter. However we have not specified the syntax of the partitioning spec.
  3. PARTITION partition_spec in main syntax block. And define "partition_spec" as parameter. In this case also we have not specified the syntax of the partitioning spec.
  4. [PARTITION partition_spec] in main syntax block and PARTITION ( partition_spec :[ partition_column = partition_col_value, partition_column = partition_col_value, ...] ) in parameter.
  5. partition_spec in main syntax block. And in the parameter section, a sub syntax that defines the syntax of partitioning.

Again we can pick one of the above 5 or pick another option.

cc @srowen @huaxingao

@srowen
Copy link
Member

srowen commented Nov 24, 2019

@dilipbiswal standardization would be welcome. I know the new docs were written by 4-5 different people and I'm sure they're not entirely consistent. I don't have a strong opinion on the option, but consistency is best. Go with whatever is most prevalent in the docs already? but any consistency is an improvement.

@dilipbiswal
Copy link
Contributor Author

@srowen I had an offline discussion with @huaxingao and we have agreed on a format. She has graciously agreed to fix the inconsistencies in the parameter. Thank you @huaxingao.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants