Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(es/hygiene): block scope #3219

Closed
wants to merge 17 commits into from

Conversation

magic-akari
Copy link
Member

@magic-akari magic-akari commented Jan 9, 2022

Description:
This pull request try to fix block scope issues.

Previous behavior:
Rename every ident if conflict.

Current behavior:
Only the idnet marked as block_scoped will be renamed.

block_scoped ident is an identifier which is var, but should behave like let/const.
There is a similar definition at Babel.

  • When let/const transforms to var, we mark it as block_scoped ident.
  • We assume all private_ident is block_scoped ident.

TODO:

  • There are too many cases where privae_ident should be used but is not, which will result in name conflicts.
  • Ident::new or direct construction should be disabled. It is better only use private_ident and quote_ident to construct Ident.

BREAKING CHANGE:

Related issue (if exists):

@magic-akari magic-akari force-pushed the fix/issue-3213 branch 4 times, most recently from 2fe7966 to e6cb017 Compare January 19, 2022 17:16
@magic-akari magic-akari changed the title fix(es/hygiene): Avoid redundant renames fix(es/hygiene): block scope Jan 19, 2022
@magic-akari magic-akari force-pushed the fix/issue-3213 branch 3 times, most recently from a0c7378 to 315903b Compare January 20, 2022 05:00
@magic-akari
Copy link
Member Author

Hi @kdy1 ,
There are some issues with swcHelpers need to be resolved.

  • With config externalHelpers: true, we should mark swcHelpers as block_scoped by using private_ident.
    And pass ident into transformed code as ref ident to ensure they are connected when renaming.

  • With config externalHelpers: false, we should mark all exposed ident as private_ident and ref them during transformation. By doing so, we will solve the issue mentioned by @Austaras .

The pull request is not completed. But it is better to review it now.

@kdy1
Copy link
Member

kdy1 commented Jan 20, 2022

With config externalHelpers: true, we should mark swcHelpers as block_scoped by using private_ident.
And pass ident into transformed code as ref ident to ensure they are connected when renaming.

Why should we make swcHelpers block scoped? Single Mark for swcHelpers is enough I think?

With config externalHelpers: false, we should mark all exposed ident as private_ident and ref them during transformation. > By doing so, we will solve the issue mentioned by @Austaras .

Why? We can use a helper mark, shared by helpers.

@magic-akari
Copy link
Member Author

magic-akari commented Jan 20, 2022

With config externalHelpers: true, we should mark swcHelpers as block_scoped by using private_ident.
And pass ident into transformed code as ref ident to ensure they are connected when renaming.

Why should we make swcHelpers block scoped? Single Mark for swcHelpers is enough I think?

With config externalHelpers: false, we should mark all exposed ident as private_ident and ref them during transformation. > By doing so, we will solve the issue mentioned by @Austaras .

Why? We can use a helper mark, shared by helpers.

If there are some codes as following:

{
    var foo = 1;
}

console.log(foo);

Should we rename it? It depends.
If the code is user-entered, keep the name. The foo should be hoisted out from the block.

But if they are transformed from following:

REPL

{
    let foo = 1;
}

console.log(foo);

Should we rename it? No, the foo is block scoped and the foo in console.log refers to unresolved binding which may be window.foo.

How do we recognize them?
Mark is not enough. We need more information.
When a block-scoped Ident (let or const) is transformed into var, mark it as block_scoped.
Babel did it in a similar way.

We treat all Ident constructed during code transformed as block_scoped as well.
In the process of code transformation, We use private_ident to avoid naming conflicts.
In most cases, we do not want private_ident to be hoisted. It makes sense to mark private_ident as block_scoped.

In result we should never rename Ident that untouched by code transformation.
We only resolve the case of naming conflicts caused by our code transform.

If there is naming conflicts in user input, it's better to report it from lint and leave it untouched.

@kdy1
Copy link
Member

kdy1 commented Jan 20, 2022

I mean, the helper mark is almost the same as private_ident, but just shared by various helper functions.

All helpers are injected on top-level, and we can detect conflict from hygiene, without any more information. If user references an unresolved identifier, it's decl-ref conflict as helpers use their own SyntaxContext.

@kdy1
Copy link
Member

kdy1 commented Jan 20, 2022

Private identifier is simply a symbol + unique SyntaxContext, while helper identifier is a symbol + unique SyntaxContext (per file, shared among helpers in a single module)

ctxt,
block_scoped: bs,
..
}| ctxt == &id.1 && (block_scoped || *bs),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kdy1
Do you mean that if we detect some specific SyntaxContext, we should treat it as private_ident as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, actually SyntaxContext is the trick behind private_ident!.

private_ident! works by creating a unique SyntaxContext (by SyntaxContext::empty(Mark::fresh(Mark::root()))), and creating Ident witth provided symbol and the SyntaxContext we made.

Actual renaming is done by the hygiene pass.

Copy link
Member Author

@magic-akari magic-akari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @kdy1 ,
In the original code, The first Ident will be renamed.

let cur_scope_conflict = if ctxts_of_decls.iter().any(|(_, ctxt)| ctxt == &id.1) {

For imported swcHelpers, the first Ident is what we import and should be renamed.

But it cannot handle these two cases at the same time
REPL

var a = 1;

{
    let a = 1;
}

REPL

{
    let a = 1;
}

var a = 1;

Notice: only the let a should be renamed.

I handle these two cases by detect the block_scoped.

It means I cannot determine which one should be renamed if two Ident are in the same context without extra information.

}));
return _foo.apply(this, arguments);
}
var swcHelpers = 1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @kdy1 ,
I add a test case.
It seems to be a regression in my changes.

Copy link
Member

@kdy1 kdy1 Jan 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If two identifier has same symbol and same syntax context, those are equal to each other.
If two different variables get the same syntax context, then it's bug of resolver pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this REPL, we get the wrong result because we can't distinguish which one needs to be renamed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... Then we may remove the logic for retaining identifiers of top-level items?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not related to top level. We should not touch the var Ident.
See this swc playground and this Babel REPL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... You are right, although I'm not sure how can we fix this.

Scoping in rust is super-hard because of no two &mut to same data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could check the first commit in this pull request that I added some test cases about it.

@kdy1
Copy link
Member

kdy1 commented Jan 20, 2022

Honestly I don't think adding a field to Ident is a good idea.

@magic-akari
Copy link
Member Author

Without adding a field to the Ident, is there any way to mark a Ident as block_scoped in code transformation and consume it in hygiene pass?

FYI, Babel store them in NodePath.

@kdy1
Copy link
Member

kdy1 commented Jan 20, 2022

If we can collect block-scoped Ids from somewhere, it would be possible. If we only make block-scoped identifiers some of passes, we can store it in some variable (e.g. shared by resolver, block_scoping, hygiene)

@magic-akari
Copy link
Member Author

If we can collect block-scoped Ids from somewhere, it would be possible. If we only make block-scoped identifiers some of passes, we can store it in some variable (e.g. shared by resolver, block_scoping, hygiene)

chain!(
self.pass,
Optional::new(private_in_object(), syntax.private_in_object()),
compat_pass,
// module / helper
Optional::new(
modules::import_analysis::import_analyzer(Rc::clone(&module_scope)),
need_interop_analysis
),
compat::reserved_words::reserved_words(),
Optional::new(export_namespace_from(), need_interop_analysis),
Optional::new(helpers::inject_helpers(), self.inject_helpers),
ModuleConfig::build(
self.cm.clone(),
base_url,
paths,
base,
self.top_level_mark,
module,
Rc::clone(&module_scope)
),
as_folder(MinifierPass {
options: self.minify,
cm: self.cm.clone(),
comments: comments.cloned(),
top_level_mark: self.top_level_mark,
}),
Optional::new(
hygiene_with_config(self.hygiene.clone().unwrap_or_default()),
self.hygiene.is_some() && !is_mangler_enabled
),
Optional::new(fixer(comments.map(|v| v as &dyn Comments)), self.fixer),
)

Maybe I can pass a hashset to collect Ident from here.

@magic-akari magic-akari force-pushed the fix/issue-3213 branch 7 times, most recently from a5077ef to 67e3362 Compare February 1, 2022 08:56
@magic-akari magic-akari marked this pull request as ready for review February 1, 2022 09:15
Copy link
Member

@kdy1 kdy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the hard work!

crates/swc_ecma_transforms_base/src/hygiene/tests.rs Outdated Show resolved Hide resolved
@@ -11,7 +11,7 @@ use swc_ecma_transforms_testing::{compare_stdout, test, test_exec, Tester};

test!(
::swc_ecma_parser::Syntax::default(),
|_| block_scoping(),
|_| block_scoping(Default::default()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swc_ecma_transforms_testing will apply hygiene(Default::default()).
I'm concerned a bit because usages of different values may make bug hidden.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's enough for other test suites, but I'm not sure if it's fine for tests for block_scoping

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.
I just updated the test config to use resolver and hygiene.

crates/swc_ecma_transforms_base/src/ident_scope.rs Outdated Show resolved Hide resolved
@@ -136,6 +136,7 @@ fn simple() {
}

#[test]
#[ignore]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this ignored?

@magic-akari magic-akari force-pushed the fix/issue-3213 branch 2 times, most recently from 9aac356 to 07d63ee Compare February 2, 2022 15:34
@@ -313,8 +317,8 @@ fn for_loop() {
},
"
for(var a=1;;) {}
for(var a1 of foo) {}
for(var a2 = 3;;) {}
for(var a of foo) {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the main idea of this PR is to leave the user input vars untouched.
But when updating this test file, I found many cases where the var was intentionally changed.
Maybe I'm thinking in the wrong way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes it's inevitable, as var and let/const have different scoping.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are some cases where var ident of different scopes are together and conflict with each other, this pull request is meaningless and I will close it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those updates are mostly not intentional.
The renamed vars in tests are renamed just because it works and it was required to pass the test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, almost all renamings in his files are intentional.
They have different syntax contexts, so those are different variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests in this file are extracted from other tests, and those are tests for injected identifiers.

crates/swc_ecma_transforms_base/src/ident_scope.rs Outdated Show resolved Hide resolved
})
.or_insert_with(|| self.ident_scope_kind);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But seems like this does the exact thing collect_bindings do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we collect Bindings.

let a = 1,
  [b, ...c] = [2, 3],
  { d: e } = {},
  { f = g } = {},
  { h: i = j } = {},
  { ...k } = {};

In this case, we got a, b, c, e, f, i and k.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you used another visitor instead of using collect_bindings?
Each visitor uses more space in the final binary than expected

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, do you mean BindingCollector, I just find it.

Copy link
Member Author

@magic-akari magic-akari Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried the collect_decls but got the wrong result. Maybe I misused it, I will try again.
Maybe I should use find_ids, but I need to ignore key in KeyValuePatProp and value in AssignPatProp.

Just like what I do as mentioned:

/// let { x: y } = z;
fn visit_key_value_pat_prop(&mut self, n: &KeyValuePatProp) {
n.value.visit_with(self);
}
/// let { x = y } = z;
fn visit_assign_pat_prop(&mut self, n: &AssignPatProp) {
n.key.visit_with(self);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, find_ids works well.

@magic-akari magic-akari marked this pull request as draft February 3, 2022 10:21
@magic-akari
Copy link
Member Author

magic-akari commented Feb 3, 2022

another case

input:

if (true) {
    const a = 1;
    const b = 2;
    function foo(a, b) {
        use(a, b);
    }
} else {
    const a = 1;
    const b = 2;
    function bar(a, b) {
        use(a, b);
    }
}

expect output:

if (true) {
    var foo = function foo(a, b) {
        use(a, b);
    };
    var a = 1;
    var b = 2;
} else {
    var bar = function bar(a, b) {
        use(a, b);
    };
    var a1 = 1;
    var b1 = 2;
}

main branch output(REPL):

if (true) {
    var foo = function foo(a3, b3) {
        use(a3, b3);
    };
    var a2 = 1;
    var b2 = 2;
} else {
    var bar = function bar(a, b) {
        use(a, b);
    };
    var a1 = 1;
    var b1 = 2;
}

this PR output (as same as main branch):

if (true) {
    var foo = function foo(a3, b3) {
        use(a3, b3);
    };
    var a2 = 1;
    var b2 = 2;
} else {
    var bar = function bar(a, b) {
        use(a, b);
    };
    var a1 = 1;
    var b1 = 2;
}

I will take a look at why I got this result.

@magic-akari
Copy link
Member Author

Closed due to outdated

@magic-akari magic-akari deleted the fix/issue-3213 branch February 26, 2022 16:09
@magic-akari magic-akari restored the fix/issue-3213 branch February 28, 2022 03:59
@magic-akari magic-akari deleted the fix/issue-3213 branch March 6, 2022 15:23
@swc-project swc-project locked as resolved and limited conversation to collaborators Oct 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
2 participants