Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Unified PPL Data Type #28

Open
penghuo opened this issue Feb 17, 2025 · 0 comments
Open

[RFC] Unified PPL Data Type #28

penghuo opened this issue Feb 17, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@penghuo
Copy link
Collaborator

penghuo commented Feb 17, 2025

Is your feature request related to a problem?

Current State: Fragmented Data Type Systems in PPL Engines

Query engines such as OpenSearch PPL and Spark PPL employ distinct data type systems, creating interoperability challenges in multi-engine environments. Key examples include:

  • Type Name Mismatches: OpenSearch PPL defines INTEGER (string representation: integer).Spark PPL uses IntegerType (string representation: int). Despite representing semantically equivalent 32-bit signed integers, the syntactic inconsistency disrupts cross-engine workflows.
  • Engine-Specific Types: OpenSearch PPL introduces specialized types like IP and GEO_POINT, which lack native equivalents in other engines.

Impact:

  • Integration Issues: Tools like OpenSearch Dashboards face parsing errors or misaligned visualizations when processing results from engines with mismatched type systems.
  • Manual Overhead: Users must rewrite queries or cast types explicitly when migrating between engine

What solution would you like?

To eliminate friction and ensure seamless interoperability, all PPL-compliant engines should adopt a common data type system with the following principles:

  • Standardized Type Names, Universal type names and string representations (e.g., int instead of INTEGER or IntegerType).
  • Semantic Consistency, Equivalent types (e.g., 32-bit integers) must behave identically in syntax, casting rules, and operations (e.g., arithmetic, comparisons). Engine-specific types (e.g., ip, geo_point) should be opt-in extensions with clear documentation.
  • Interoperability Guarantee Queries and schemas written for one engine should execute seamlessly on others without manual adjustments.

Do you have any additional context?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant