Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Extend Expr::ScalarUDF to support Expr for ScalarUDF #8222

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 23 additions & 3 deletions datafusion/expr/src/expr_fn.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,15 @@ use crate::expr::{
Placeholder, ScalarFunction, TryCast,
};
use crate::function::PartitionEvaluatorFactory;
use crate::WindowUDF;
use crate::{
aggregate_function, built_in_function, conditional_expressions::CaseBuilder,
logical_plan::Subquery, AccumulatorFactoryFunction, AggregateUDF,
BuiltinScalarFunction, Expr, LogicalPlan, Operator, ReturnTypeFunction,
ScalarFunctionImplementation, ScalarUDF, Signature, StateTypeFunction, Volatility,
};
use crate::{ColumnarValue, WindowUDF};
use arrow::datatypes::DataType;
use datafusion_common::{Column, Result};
use datafusion_common::{internal_err, Column, DataFusionError, Result};
use std::ops::Not;
use std::sync::Arc;

Expand Down Expand Up @@ -993,7 +993,27 @@ pub fn create_udwf(
pub fn call_fn(name: impl AsRef<str>, args: Vec<Expr>) -> Result<Expr> {
match name.as_ref().parse::<BuiltinScalarFunction>() {
Ok(fun) => Ok(Expr::ScalarFunction(ScalarFunction::new(fun, args))),
Err(e) => Err(e),
Err(_) => {
// Constructing a `ScalarUDF` with only name and stub impl/return_type...
// This unresolved UDF will be resolved during analyzing using registered functions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can make this work, it seems like a good idea to me.

However, it is not clear to me how the subsequent passes will know they have to resolve this particular scalar UDF though 🤔 so I am not sure this approach is viable

Specifically, how would the code look that the analysis pass used to test if a ScalarUDF was unresolved or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point, without enum the implementation seems a bit vulnerable to bugs
I think we can either make Expr::ScalarFunction or Expr::ScalarUDF a enum, since the long-term goal is to remove BuiltinScalarFunction they should be same.
I'll check which way can make future migration easier

pub enum ScalarFunctionDefinition {
  /// Resolved to a user defined function
  UDF(ScalarUDF),
  /// A scalar function that will be called by name
  Name(Arc<str>),
}

#[derive(Clone, PartialEq, Eq, Hash, Debug)]
pub struct ScalarUDF {
    /// The function
    pub fun: ScalarFunctionDefinition,
    /// List of expressions to feed to the functions as arguments
    pub args: Vec<Expr>,
}
pub enum ScalarFunctionDefinition {
  /// Resolved to a built in scalar function
  /// (will be removed long term)
  BuiltIn(built_in_function::BuiltinScalarFunction),
  /// Resolved to a user defined function
  UDF(ScalarUDF),
  /// A scalar function that will be called by name
  Name(Arc<str>),
}

#[derive(Clone, PartialEq, Eq, Hash, Debug)]
pub struct ScalarFunction {
    /// The function
    pub fun: ScalarFunctionDefinition,
    /// List of expressions to feed to the functions as arguments
    pub args: Vec<Expr>,
}

let placeholder_impl: ScalarFunctionImplementation =
Arc::new(|_: &[ColumnarValue]| {
return internal_err!("Unresolved function should not be called");
});
let return_type_func: ReturnTypeFunction =
Arc::new(move |_| Ok(Arc::new(DataType::Null)));
let unresolved_udf = crate::ScalarUDF::new(
name.as_ref(),
&Signature::exact(vec![], Volatility::Immutable),
&return_type_func,
&placeholder_impl,
);

Ok(Expr::ScalarUDF(crate::expr::ScalarUDF::new(
Arc::new(unresolved_udf),
args,
)))
}
}
}

Expand Down