Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow while generating logical plan from statement #6040

Closed
mingnuj opened this issue Apr 18, 2023 · 0 comments · Fixed by #6360
Closed

Stack overflow while generating logical plan from statement #6040

mingnuj opened this issue Apr 18, 2023 · 0 comments · Fixed by #6360
Labels
bug Something isn't working

Comments

@mingnuj
Copy link

mingnuj commented Apr 18, 2023

Describe the bug

While using multiple conditions are used, a stack overflow error occurs.

In particular, when used with tokio, more limitations arise because the default stack size is 2MiB.

To Reproduce

I referred to reproduce code from issue #1434 provided by @mcassels.
SELECT * FROM table WHERE <condition0> OR <condition1> OR ...

use datafusion::{
    arrow::datatypes::{DataType, Field, Schema},
    common::Result,
    config::ConfigOptions,
    error::DataFusionError,
    logical_expr::{
        logical_plan::builder::LogicalTableSource, AggregateUDF, ScalarUDF, TableSource,
    },
    sql::{
        planner::{ContextProvider, SqlToRel},
        sqlparser::{dialect::GenericDialect, parser::Parser},
        TableReference,
    },
};
use std::{collections::HashMap, sync::Arc};

#[global_allocator]
static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;

#[tokio::main]
async fn main() -> Result<()> {
    let num_conditions = 255;
    let where_clause = (0..num_conditions)
        .map(|i| format!("column1 = 'value{:?}'", i))
        .collect::<Vec<String>>()
        .join(" OR ");
    let sql = format!("SELECT * from table1 where {};", where_clause);
    get_optimized_plan(sql).await?;
    println!("query succeeded with {:?} conditions", num_conditions);

    let num_conditions = 256;
    let where_clause = (0..num_conditions)
        .map(|i| format!("column1 = 'value{:?}'", i))
        .collect::<Vec<String>>()
        .join(" OR ");
    let sql = format!("SELECT * from table1 where {};", where_clause);
    get_optimized_plan(sql).await?;
    println!("query succeeded with {:?} conditions", num_conditions);

    Ok(())
}

async fn get_optimized_plan(sql: String) -> Result<()> {
    let schema_provider = TestSchemaProvider::new();

    let dialect = GenericDialect {};
    let ast = Parser::parse_sql(&dialect, &sql).unwrap();
    let statement = &ast[0];
    let sql_to_rel = SqlToRel::new(&schema_provider);
    sql_to_rel.sql_statement_to_plan(statement.clone()).unwrap();

    Ok(())
}

struct TestSchemaProvider {
    options: ConfigOptions,
    tables: HashMap<String, Arc<dyn TableSource>>,
}

impl TestSchemaProvider {
    pub fn new() -> Self {
        let mut tables = HashMap::new();
        tables.insert(
            "table1".to_string(),
            create_table_source(vec![Field::new(
                "column".to_string(),
                DataType::Utf8,
                false,
            )]),
        );

        Self {
            options: Default::default(),
            tables,
        }
    }
}

fn create_table_source(fields: Vec<Field>) -> Arc<dyn TableSource> {
    Arc::new(LogicalTableSource::new(Arc::new(
        Schema::new_with_metadata(fields, HashMap::new()),
    )))
}

impl ContextProvider for TestSchemaProvider {
    fn get_table_provider(&self, name: TableReference) -> Result<Arc<dyn TableSource>> {
        match self.tables.get(name.table()) {
            Some(table) => Ok(table.clone()),
            _ => Err(DataFusionError::Plan(format!(
                "Table not found: {}",
                name.table()
            ))),
        }
    }

    fn get_function_meta(&self, _name: &str) -> Option<Arc<ScalarUDF>> {
        None
    }

    fn get_aggregate_meta(&self, _name: &str) -> Option<Arc<AggregateUDF>> {
        None
    }

    fn get_variable_type(&self, _variable_names: &[String]) -> Option<DataType> {
        None
    }

    fn options(&self) -> &ConfigOptions {
        &self.options
    }
}

Output

query succeeded with 255 conditions

thread 'main' has overflowed its stack
fatal runtime error: stack overflow

If there are more than 256 conditions, stack overflow occurs. This happens only debug mode, related to #1434 (comment).

Expected behavior

Work without overflows..

Additional context

I guess 2 approaches to this problem.

Approach#1
Parameters are received as reference or without using box pointers in some functions, such as select_to_plan and plan_selection. This maybe can make Stack grow faster.

And I found some stack allocation with enumeration.
https://www.reddit.com/r/rust/comments/zbla3j/how_does_enums_work_where_are_they_allocated/

Approach#2
Using Address Sanitizer with the above example, error occurred in fmt::Display. But, I'm not sure exactly where it happened.

This would be related to rust issue: rust-lang/rust#45838 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant