Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: stage created with quoted name can not be corretly refereced with @<stage_name> #16521

Closed
youngsofun opened this issue Sep 25, 2024 · 2 comments · Fixed by #16522
Closed
Labels
C-bug Category: something isn't working

Comments

@youngsofun
Copy link
Member

youngsofun commented Sep 25, 2024

Summary

create stage, like create table, accept any valid ident, when there are special characters, quote is used.
however, @<stage_name> use escaping instead of quoting for '\s' etc,
this does not make sense.

    #[regex(r#"@([^\s`;'"()]|\\\s|\\'|\\"|\\\\)+"#)]
     LiteralAtString,

This bug has existed for a long time, and almost no one has complained about it, so there is reason to believe that a simple stage name (plain ident) will be sufficient.

   #[regex(r#"[_a-zA-Z][_$a-zA-Z0-9]*"#)]
    Ident,
@dosubot dosubot bot added the C-bug Category: something isn't working label Sep 25, 2024
@youngsofun
Copy link
Member Author

youngsofun commented Sep 25, 2024

cc @flaneur2020 @everpcpc @wubx @Xuanwo

Copy link

dosubot bot commented Sep 25, 2024

Hey there, @youngsofun! 👋 I'm here to help you out with any bugs, questions, or contributions while you wait for a human maintainer. I'm a bot designed to assist you in any way I can.

To fix the bug where a stage created with a quoted name cannot be correctly referenced with @<stage_name> due to the use of escaping instead of quoting for special characters, you need to ensure that the stage name is properly quoted when it is referenced. This can be done by modifying the parsing logic to handle quoted identifiers correctly.

Here is a potential fix in the create_stage parsing logic:

let create_stage = map_res(
    rule! {
        CREATE ~ ( OR ~ ^REPLACE )? ~ STAGE ~ ( IF ~ ^NOT ~ ^EXISTS )?
        ~ ( #quoted_stage_name )
        ~ ( (URL ~ ^"=")? ~ #uri_location )?
        ~ ( #file_format_clause )?
        ~ ( ON_ERROR ~ ^"=" ~ ^#ident )?
        ~ ( SIZE_LIMIT ~ ^"=" ~ ^#literal_u64 )?
        ~ ( VALIDATION_MODE ~ ^"=" ~ ^#ident )?
        ~ ( (COMMENT | COMMENTS) ~ ^"=" ~ ^#literal_string )?
    },
    |(
        _,
        opt_or_replace,
        _,
        opt_if_not_exists,
        stage,
        url_opt,
        file_format_opt,
        on_error_opt,
        size_limit_opt,
        validation_mode_opt,
        comment_opt,
    )| {
        let create_option =
            parse_create_option(opt_or_replace.is_some(), opt_if_not_exists.is_some())?;
        Ok(Statement::CreateStage(CreateStageStmt {
            create_option,
            stage_name: stage.to_string(),
            location: url_opt.map(|(_, location)| location),
            file_format_options: file_format_opt.unwrap_or_default(),
            on_error: on_error_opt.map(|v| v.2.to_string()).unwrap_or_default(),
            size_limit: size_limit_opt.map(|v| v.2 as usize).unwrap_or_default(),
            validation_mode: validation_mode_opt
                .map(|v| v.2.to_string())
                .unwrap_or_default(),
            comments: comment_opt.map(|v| v.2).unwrap_or_default(),
        }))
    },
);

In this fix, #quoted_stage_name should be a rule that correctly parses quoted identifiers, ensuring that special characters are handled properly. This way, when the stage name is referenced with @<stage_name>, it will be correctly interpreted.

Additionally, ensure that the resolve_stage_location function correctly handles quoted stage names:

#[async_backtrace::framed]
pub async fn resolve_stage_location(
    ctx: &dyn TableContext,
    location: &str,
) -> Result<(StageInfo, String)> {
    let names: Vec<&str> = location.splitn(2, '/').filter(|v| !v.is_empty()).collect();

    let stage = if names[0] == "~" {
        StageInfo::new_user_stage(&ctx.get_current_user()?.name)
    } else {
        UserApiProvider::instance()
            .get_stage(&ctx.get_tenant(), names[0])
            .await?
    };

    let path = names.get(1).unwrap_or(&"").trim_start_matches('/');
    let path = if path.is_empty() { "/" } else { path };

    debug!("parsed stage: {stage:?}, path: {path}");
    Ok((stage, path.to_string()))
}

This ensures that the stage name is correctly parsed and resolved, even if it contains special characters.

Additionally, the StageInfo struct and related methods should be updated to handle quoted stage names properly:

/// Create a new user stage.
pub fn new_user_stage(user: &str) -> StageInfo {
    StageInfo {
        stage_name: user.to_string(),
        stage_type: StageType::User,
        ..Default::default()
    }
}

/// Update user stage with stage name.
pub fn with_stage_name(mut self, name: &str) -> StageInfo {
    self.stage_name = name.to_string();
    self
}

/// Get the prefix of stage.
///
/// Use this function to get the prefix of this stage in the data operator.
///
/// # Notes
///
/// This function should never be called on external stage because it's meaningless. Something must be wrong.
pub fn stage_prefix(&self) -> String {
    match self.stage_type {
        StageType::LegacyInternal => format!("/stage/{}/", self.stage_name),
        StageType::External => {
            unreachable!("stage_prefix should never be called on external stage, must be a bug")
        }
        StageType::Internal => format!("/stage/internal/{}/", self.stage_name),
        StageType::User => format!("/stage/user/{}/", self.stage_name),
    }
}

By ensuring that the stage name is correctly quoted and parsed at all stages of its lifecycle, you can avoid issues related to special characters in stage names.

Additionally, the ident_needs_quote and display_ident functions from quote.rs should be used to determine if a stage name needs quoting and to display it correctly:

pub fn ident_needs_quote(ident: &str) -> bool {
    if ident.is_empty() {
        return true;
    }

    // avoid quote the special identifier "~" which is an available stage name
    if ident == "~" {
        return false;
    }

    let mut chars = ident.chars();
    let first = chars.next().unwrap();
    if !first.is_ascii_alphabetic() && first != '_' {
        return true;
    }

    for c in chars {
        if !c.is_ascii_alphanumeric() && c != '_' && c != '$' {
            return true;
        }
    }

    false
}

pub fn display_ident(name: &str, quoted_ident_case_sensitive: bool, dialect: Dialect) -> String {
    // Db-s -> "Db-s" ; dbs -> dbs
    if name.chars().any(|c| c.is_ascii_uppercase()) && quoted_ident_case_sensitive
        || ident_needs_quote(name)
    {
        QuotedIdent(name, dialect.default_ident_quote()).to_string()
    } else {
        name.to_string()
    }
}

By integrating these functions, you ensure that stage names are correctly quoted and displayed, preventing issues with special characters [1][2][3][4].

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant