Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lint to prevent (most) URLs #104 #106

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Conversation

KoxyG
Copy link

@KoxyG KoxyG commented Sep 4, 2024

Pls review and let me know if there is anything that needs changes @SamWilsn

// Report the issue
let source = self.ctx.source_for_text(ast.sourcepos.start.line, text);
let offending_url = self.re.find(text).unwrap().as_str();
let suggestion_label = format!("Avoid using backticks in URLs: `{}`", offending_url);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't investigated the code yet, but this text seems to indicate it does the opposite of what we need.

The lint is supposed to make sure authors don't put URLs in backticks, not that they don't put backticks in URLs.

Comment on lines 15 to 16
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct PreventUrlsNoBackticks<S>(pub S);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably look something like:

Suggested change
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct PreventUrlsNoBackticks<S>(pub S);
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct PreventUrlsNoBackticks<S> {
allowed_domains: Vec<S>,
}

Then the default configuration in lib.rs would look like:

        (
            "markdown-prevent-url",
            MarkdownPreventUrl {
                allowed_domains: vec![
                    r"(.+\.)?example.com",
                    r"(.+\.)?invalid",
                    // And so on...
                ]
            }
        ), 

header: value1
---

Check this URL: `http://example.com/page?query=foo`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should write tests based on the examples in #104. This URL is very close to one of the examples I gave that should not trigger the lint.

In other words, this URL should be allowed because its domain is example.com—one of the allowed example domains at the top of #104.

@KoxyG
Copy link
Author

KoxyG commented Sep 7, 2024

Hi @SamWilsn I just made changes. Kindly check and help review.

Copy link
Contributor

@SamWilsn SamWilsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick review. You'll need to resolve the compilation issues before we can go any further.


// Regex to match URLs not from allowed domains and containing backtick

let re = Regex::new(r"https?://(?:example\.com|example\.net|example\.org|example|invalid|test)(?:/[^`\s]*)?").unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't quite figured out what this regex is for yet, but you shouldn't hardcode the list of domains here. It could get out of sync with self.allowed_domains.

// If the URL is in backticks, report it
if in_backticks {
let source = self.ctx.source_for_text(ast.sourcepos.start.line, text);
let message = format!("URLs are not allowed in backticks: `{}`", url);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let message = format!("URLs are not allowed in backticks: `{}`", url);
let message = "Valid URLs are not allowed in backticks";

I think we can use a simpler error message here.

title: Some(Annotation {
annotation_type: self.ctx.annotation_type(),
id: Some(self.slug),
label: Some("not from allowed domain found"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
label: Some("not from allowed domain found"),
label: Some("non-reserved domain"),

How about that?

Comment on lines +110 to +133
// Report other disallowed URLs
let source = self.ctx.source_for_text(ast.sourcepos.start.line, text);
let suggestion_label = format!("This URL must be hyperlinked or from an allowed domain: `{}`", url);

self.ctx.report(Snippet {
opt: Default::default(),
title: Some(Annotation {
annotation_type: self.ctx.annotation_type(),
id: Some(self.slug),
label: Some("Disallowed URL in plain text or code fence"),
}),
slices: vec![Slice {
fold: false,
line_start: ast.sourcepos.start.line,
origin: self.ctx.origin(),
source: &source,
annotations: vec![],
}],
footer: vec![Annotation {
annotation_type: AnnotationType::Help,
id: None,
label: Some(&suggestion_label),
}],
})?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediately obvious why this section is separate from the above one. Could you explain?

Comment on lines +145 to +171
impl<'a, 'b, 'c> tree::Visitor for Visitor<'a, 'b, 'c> {
type Error = Error;

fn enter_link(&mut self, _ast: &Ast, _link: &str) -> Result<Next, Self::Error> {
Ok(Next::TraverseChildren)
}

fn enter_code(&mut self, ast: &Ast, code: &comrak::nodes::NodeCode) -> Result<Next, Self::Error> {
self.check(ast, &code.literal, false, false)
}

fn enter_text(&mut self, ast: &Ast, text: &str) -> Result<Next, Self::Error> {
self.check(ast, text, false, false)
}

fn enter_html_inline(&mut self, ast: &Ast, html: &str) -> Result<Next, Self::Error> {
// For inline HTML links like <https://link>
self.check(ast, html, true, false)
}

fn enter_code_block(&mut self, ast: &Ast, block: &comrak::nodes::NodeCodeBlock) -> Result<Next, Self::Error> {
let code_text = String::from_utf8_lossy(&block.literal);
self.check(ast, &code_text, false, false)
}


}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing a few cases here I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants