Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-8327][CORE] Introduce the ConfigEntry to make the config definition more flexible #8328

Merged
merged 1 commit into from
Dec 30, 2024

Conversation

yikf
Copy link
Contributor

@yikf yikf commented Dec 24, 2024

What changes were proposed in this pull request?

#8327 . subtask of #8326

How was this patch tested?

unit tests and GA

Copy link

#8327

Copy link

Run Gluten Clickhouse CI on x86

@yikf
Copy link
Contributor Author

yikf commented Dec 24, 2024

@zhztheplayer @jackylee-ch Could you please take a look, thanks!

private val TIME_STRING_PATTERN = Pattern.compile("(-?[0-9]+)([a-z]+)?")

def timeFromString(str: String, unit: TimeUnit): Long = {
val lower = str.toLowerCase(Locale.ROOT).trim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use JavaUtils.timeStringAs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

new TypedConfigBuilder(this, byteFromString(_, unit), byteToString(_, unit))
}

def fallbackConf[T](fallback: ConfigEntry[T]): ConfigEntry[T] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any different between withAlternatives and fallbackConf? maybe only use one of them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallbackConf is a complete ConfigEntry, including doc, version information, etc. withAlternatives is just a key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withAlternative should be enough if we have some config name changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meanings expressed are still different. "Alternative" indicates a choice, while for "rename", what we aim to do is to eventually deprecated it, and it should contain more warning information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the fallbackConf means we would use the conf value from other configs espetially when current config not set, and the withAlternative means the config name changed.

I'm ok with current meaning of fallbackConf, but for withAlternative, deprecated sounds better to me. The old config name is not recommended. Using deperacated can better remind users and make it easier for us to remove it in the future.
A little case for withAlternative:

 val EXTENDED_COLUMNAR_TRANSFORM_RULES =
    buildConf("spark.gluten.sql.columnar.extended.columnar.transform.rules")
      .withAlternative("spark.gluten.sql.columnar.extended.columnar.pre.rules")
      .doc("A comma-separated list of classes for the extended columnar transform rules.")
      .stringConf
      .createWithDefaultString("")

After:

 val EXTENDED_COLUMNAR_PRE_RULES =
    buildConf("spark.gluten.sql.columnar.extended.columnar.pre.rules")
      .doc("A comma-separated list of classes for the extended columnar transform rules.")
      .deprecated('1.2.0')
      .stringConf
      .createWithDefaultString("")

 val EXTENDED_COLUMNAR_TRANSFORM_RULES =
    buildConf("spark.gluten.sql.columnar.extended.columnar.transform.rules")
      .doc("A comma-separated list of classes for the extended columnar transform rules.")
      .version('1.2.0')
      .stringConf
      .fallbackConf(EXTENDED_COLUMNAR_PRE_RULES)

@zhztheplayer any thoughs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, vanilla Spark already has some cases for this kind of definition.

  val ARROW_PYSPARK_FALLBACK_ENABLED =
    buildConf("spark.sql.execution.arrow.pyspark.fallback.enabled")
      .doc(s"When true, optimizations enabled by '${ARROW_PYSPARK_EXECUTION_ENABLED.key}' will " +
        "fallback automatically to non-optimized implementations if an error occurs.")
      .version("3.0.0")
      .fallbackConf(ARROW_FALLBACK_ENABLED)

  val ARROW_FALLBACK_ENABLED =
    buildConf("spark.sql.execution.arrow.fallback.enabled")
      .doc("(Deprecated since Spark 3.0, please set " +
        "'spark.sql.execution.arrow.pyspark.fallback.enabled'.)")
      .version("2.4.0")
      .booleanConf
      .createWithDefault(true)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackylee-ch

Thank you for sharing ideas!

I see Spark's SQLConf.scala uses both fallbackConf (as mentioned) and withAlternative for this kind of key routiing functionality. So would you suggest adding another deprecated API to emphasis on the deprecation of the configuration options?

Copy link
Contributor

@jackylee-ch jackylee-ch Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So would you suggest adding another deprecated API to emphasis on the deprecation of the configuration options?

Yeap, and the withAlternative can be changed with deprecated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, will follow this approach at the subsequent PR to change the configuration name.

Copy link
Member

@zhztheplayer zhztheplayer Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See you guys' points. Temporarily, I think it's reasonable to keep the first PR aligned with vanilla Spark's manner. Let's see if we could use our own way to handle the deprecated options in subsequent efforts. Thanks!

@zhztheplayer zhztheplayer changed the title [GLUTEN-8327][CORE]Introduce the ConfigEntry to make the config definition more flexible. [GLUTEN-8327][CORE] Introduce the ConfigEntry to make the config definition more flexible. Dec 25, 2024
Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking on the configuration enhancements. This series of work could be a big millstone for Gluten's overall code quality.

Also, would you like to add some comments on the newly added config-related classes (if needed), mentioning that the code is similar to Spark's relevant code but extended for Gluten's use? So users will know the difference here more quickly.


override def version: String = _version

override def backend: BackendType = _backend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this API is not yet called. This is designed for future use, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

override def defaultValueString: String = stringConverter(_defaultVal)
}

private class ConfigEntryWithDefaultString[T](
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is private (actually equivalent to private[config]) while others are private[gluten]. Is there any particular reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, will keep consistent.

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

1 similar comment
Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

1 similar comment
Copy link

Run Gluten Clickhouse CI on x86

new TypedConfigBuilder(this, byteFromString(_, unit), byteToString(_, unit))
}

def fallbackConf[T](fallback: ConfigEntry[T]): ConfigEntry[T] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withAlternative should be enough if we have some config name changed.

private[config] var _backend = BackendType.COMMON
private[config] var _public = true
private[config] var _alternatives = List.empty[String]
private[config] var _onCreate: Option[ConfigEntry[_] => Unit] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add deprecated here, so we can deprecated some configs and warning user while user setting it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be the case, maybe. onCreate occurs when the entry is created and it always happens. However, deprecation should take place when the user actively sets a deprecated key. Checking for deprecated keys from sparkConf during plugin initialization is more appropriate and will only issue a warning once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got your means. We can add the deprecated here and check all the deprecated keys during plugin initialization. This can be done in separated pr.

@zhztheplayer zhztheplayer changed the title [GLUTEN-8327][CORE] Introduce the ConfigEntry to make the config definition more flexible. [GLUTEN-8327][CORE] Introduce the ConfigEntry to make the config definition more flexible Dec 26, 2024
@zhztheplayer
Copy link
Member

@yikf Would you rebase? Thanks!
@jackylee-ch If you have further review comments? Thanks!

Copy link

Run Gluten Clickhouse CI on x86

Copy link
Contributor

@jackylee-ch jackylee-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

@yikf
Copy link
Contributor Author

yikf commented Dec 30, 2024

@jackylee-ch @zhztheplayer Kindly ping, Could you take a look another? and help re-run the failed pipeline : )

@zhztheplayer
Copy link
Member

Thanks for the contribution and reviews!

@jackylee-ch jackylee-ch merged commit 3c53f9f into apache:main Dec 30, 2024
46 checks passed
@yikf yikf deleted the config-entry branch December 30, 2024 08:06
@yikf
Copy link
Contributor Author

yikf commented Dec 30, 2024

@jackylee-ch @zhztheplayer Thanks all guys!

@baibaichen
Copy link
Contributor

baibaichen commented Dec 31, 2024

@yikf would we directly use org.apache.spark.ConfigEntry without defining a new ConfigEntry trait

@yikf
Copy link
Contributor Author

yikf commented Dec 31, 2024

@yikf would we directly use org.apache.spark.ConfigEntry without defining a new ConfigEntry trait

Hi @baibaichen , Thank you for your well-considered insights. We hope to use a custom ConfigEntry to achieve more flexible configuration. You can refer to the related discussion at #6970 (comment).

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2024
zhztheplayer pushed a commit that referenced this pull request Jan 1, 2025
…fig definition more flexible (#8328)" (#8382)

This reverts commit 3c53f9f.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants