Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(tsx): Extract positions based on utf-16 encoding #1037

Merged
merged 4 commits into from
Aug 6, 2024

Conversation

Princesseuh
Copy link
Member

@Princesseuh Princesseuh commented Aug 5, 2024

Changes

Previously when extracting script and style tags, we tried to somewhat make it work with multibytes characters by counting them and skipping, not only was this cumbersome, it kinda didn't work because our loop would run multiple times over kinda the same characters, anyway it was annoying.

In this PR, I changed it so that we use the lineoffsets table from the sourcemapping logic to get the offsets. This is somewhat slower, especially in some extreme cases, but in most cases there's no difference, and at least it's now correct.

I also updated the frontmatter and body ranges extraction to use this method, as they suffered from the same problem

In theory, this is a breaking change, but the truth is that the numbers it'd spit out would be unusable in JS unless you did a lot of conversion yourself, now the numbers can be used as-is. Also, I doubt anyone other than me is using them...

Fixes withastro/language-tools#921

Testing

Tests should pass + updated some + added more

Docs

N/A. Though I did a JSdoc comment on the type to say that it's UTF-16 based.

Copy link

changeset-bot bot commented Aug 5, 2024

🦋 Changeset detected

Latest commit: 2b2a8da

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@astrojs/compiler Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Member

@bluwy bluwy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does feel a bit non-performant, but if there isn't any noticable slowdown in benchmarks, then it's probably fine then.

Also, this needs a changeset before merging.

@Princesseuh
Copy link
Member Author

Yeah, I was surprised to not find too much perf difference. To be clear there is one, but I could only get real differences in files with 10k+ characters, multiple script and style tags late into the file and obscene amount of emojis.

I'll take a quick look to see if there's a way to re-use the line offset table from the sourcemapping logic, it'd speed up things a bunch, otherwise I'm not too bothered, this still ends up being faster for the language server because the previous logic it did to get script and style tags was expensive

@bluwy
Copy link
Member

bluwy commented Aug 5, 2024

The power of Go and native binaries 😄

@Princesseuh
Copy link
Member Author

Refactored to use the sourcemapping line offsets instead, it's much faster! Sourcemapping is still the ultimate bottleneck, though.


// One of the more performance-sensitive parts of the compiler is the handling of multibytes characters, this benchmark is an extreme case of that.
func BenchmarkPrintToTSX(b *testing.B) {
source := `🌘🔅🔘🍮🔭🔁💝🐄👋 🍘👽💽🌉🌒🔝 📇🍨💿🎷🎯💅📱🎭👞🎫💝🍢🕡 augue tincidunt 👠💋🌵💌🍌🏄🕂🍹📣🍟 🐞🍲👷🗻🏢🐫👣🐹🔷🎢👭🍗👃 🐴🐈📐💄 et, 👽👽🔙🐒🔙🍀 🐞🍐🎵💕🍂🍭🎬🎅 ac 🏃👳📑🐶👝🔷 🔕👄🐾👡🍢💗 🌓🔙🌔🏈🔒🔄🎹🎐 🎾🐝🌁📃💞📔🔕🐕 vel 🌟🏁💴🎾🔷📪💼👣📚 👟💻🗾🎋💁🏬🐮💑 🍈🌸🍓🍥💦 et vivamus 🍑🍫🔉🔹💽🍙 rhoncus eu 📰💫💀🌺 🔋💾🔱🔷🐟 convallis facilisi vitae mollis 📨🔚💮🔃🎀🍶 🌠📑🎹🌑📫🗽🐊💁🔼🕓 ac 🌂🌳🌺🎭🍧🐑 🔭🔉💉🍗👠🍦 🔄👔🎭🍇 🏤🍂🌝👜🔺 ornare 🔽🌰🎃💝👩 🐬🌻🍩👺🎆📣 risus 🌼👧🌒🍄💄🌆🐖👐📠 🕙🍈👞🏢👅🎽🏫👃🌾🍘🕑🎼💆🎳 🐻📵🔂🍩🕦👕🐶 💚🏮📟📈🌄👱🔚 sem 🔵💱💭💫 libero bibendum 🌿🍮🎧🍴💉 🐵🔷🍒🍜 sed metus, aliquam 🐷🐬📇👔🎴🔻🏊📴🍂🎽 🏀💡🍺🌾 💣🍇🐼🌀🐟🍂🐰 sed luctus 👾🍠👻💬📋🐈👀🌕💥🐹 consectetur commodo at 📓🐣🔮🐍🔺🍐🗻🍃🍠🔬 🏁🎬🐔🌙 🎿🎊🍳💓👹🐏 👦🐩👶💻 🔉💏📧🔲🍈📹💫🎧 🌟🍯🔭🎿 🎹🗻💳🔄🕁💂 🐩👠🌗🍹 facilisis 👍🍑💤🐕🐞🐻🏩 posuere 🎑🍢🍑🏨📗👣 ultrices. Vestibulum 🌠👰💼📐🍺💫 👘🐖🔕🔤🐖💶🐢📵👍🍪 🔸🎿🔤🍡💡🌄📁👉🎎🎆🎢🔒 📴👉🌱🐍🌓🎆🏩🐀👓💹🎴 🔴🌒🔘👡🕜 🌝🐉🐑🏮🎸🐳💉🎄 🍳🐲🔭📆🐎🔼🎐🐩💾🍈🎶💅🐜🍀. 🍭🍄🌳🍏💍👈🌆 🔦🏦👵🌹🏊🎐📳🌃📘🌷💎📓🎼 🎳🔁🔂🗽 💅🏰🐊👂🌴 💍🕗🐘🔼🔰🏀 🎂📻🕛🍔🎄 vitae 📎🐆🍚🌲🐰 ipsum 👘👎📅🎈🌆💮🔁🎤 💺💉💉👐🔛🐛🐺🍸📭 💠👞🍨💖 integer 🐌📀🍂🍍🐼 volutpat, condimentum elit 🎌🌕🏊👼 🌛🕜🌚🕤🍎 🏢💨👓👤 duis amet 📹🕚🔏🏧🌷🍄👦 🔠🐬👬🍩📫🏧🎷🔢💆🎭🔳👈 🔇👿💈👧🐍🍱🔉 in 🌐🎲🌠🌟🐀👛🌞🎧 sed. Tristique malesuada id 🏆🍕🕚🎯🌶 📺🐘🏀🐮🔶🏀🍥👬💞🎃🌙🎥🔦🍗 👘👣🐂🎪🔂🎾 👎👮💈🗻🌿💰📩🔋💭💃 🍍🕑🔋👠🍆🍈👰🌅 orci nam 🔀🕠🏄🍣👘🐲💘🐥 🏥🎊🎯👾📅 👛🌽🏫🔃👋🐀👶💥🔳 🕥👪👑👯 🐓💡🔠💼🔳💲 👬🔁🍜💘🔌🌎🏃🌶🍏🍯🎺 🔟📨🔜🎱💓🔛 accumsan 🎢🐭🍉🍳🕜🏆🔗📝👿 🍗👑📜🎁🍇🍕🌛 🎓🌰👣📆🌋 👌🍱💏🔇🐆🌞📝💻🌺 dictumst 📡🍝🌐💤🐅🔔🐟📥🌓🌒🍅📨🏡👺🍬🐟 🔹💳🍨💡🎋💕🕜🏦👟🕒🕣💅💗 🏤🔈📀📜📙🍌 📫🕘👍💾📱 purus 🕒🐕👵🏄💗🐤🕝 pharetra adipiscing elit, non 🏬👸🍉👑 🎠🌇👰💄 🌕🗼🏮💇🐗 📌🏤🌲👹📌 🎐🕓📎🏤💾 tellus 💆🍨👂🕟💚🎋🌠🐳 🍘🍫🎵📚📟🔛🐑🏭🔄👌📏👚🏄🐞 💏🍊👾🍘👰📄💯🔑 proin 🐡🔝🐳🎻🐶🍜 🍕🌵📯🔖💅🕔💘🏁🌾💨🔲🔼🍜🍘🕂 in suscipit 🍎🎄🎬🔙👪💣🍣💯🕧 lectus 🌾🍚🍺🐆💉🎡👷 📖🔅👼🕞 🌄🌃📦🔲💘🌶💁🍯💿 senectus 👻💈🗽🍈 📜🍉🍒🍐🔖👙🔀🐅👙 🐬📷🎨👹🎬 🐷🌐🔽💨🎿🌌🌒 🎎🍊🔕🍁. 🍈🌄📧🏊📛👗🕟 🍋💌🔁🐛💫🔰👃🕑 ridiculus mattis 🕓🍌👘🌹 👅💏🍨🔯🍂🕃🌝👠🏁🔔 pellentesque elit eu, 💝🍧💯🔄📈📛🐻📆🔱📴🔸🍻🐘🎀 viverra 🌆🍉🍉🌆 📪🍕🏊🌜📺🔆🐢🎃 mi sed tellus luctus 🍜🍃🔶🗽 laoreet dui tristique 🔆🌸🐛🔣🏤📘🕜🍃🐋 pretium ultrices 🔳🏇🍏🎭🍀 🍱🐠👬📮🌗🌆💺🔆👨🌱 et 👸📁📩🔌👯👏👫👳💸 🐎💉🔱💦👴 🏪🍶🐸🔤 💔🏤👺👻🏤🎥🐽🔦👌🔡📚💡🔁🎻 🎵👡🕀🎩 🍸🐒🍭🕠🔢🎧🍚💪 🌸🍰💏🏁🎥🐕🔬💶 💃🌽💕🐭 💯🍁📭🐚📛📓🌏🔯📦 🕝👾📒📳💵 🏠💺🕔🕁📃 adipiscing nulla congue 🔖💲🕚🎰🔋👬 sem 👽👵👜👇 vehicula 📑🌎🍁💈 consectetur nulla ullamcorper enim, 🏫🍃🐢🔄🍝🕢👖💨👺💰👎💳🏠🏤 vel fermentum porttitor lacus 📱💧🔜🔰👱🎰🐉🔓🕧🍌🕙 🎴🍫🐱🐟🌖🕞🏥🍝📜🔃🏈🔡🐁🔗 💊🍐📝👭🎢🐳🐯📷🐀🔃. 🐦🍚🔵🕜🏩 id 🌔🔤💿🌻🍹 🌎🎈📢🔬🐖💢👸🎭 🍤🔵🔓🕕📪💢🔁👽💴 🐛👶🔢🎹🏄👜📌🍭🔼🎫🍯🎦💎🐦 🐨💧🍴🌊🔁 consequat pretium 👍🏬🏰👖🍥📺👛🌕 🏄🔱🔩🐏🌞🌺🐌👑 🏰🕢💱👖🐶🐥📯🎶💧🍭🔀🎾🏩🍑 massa, est 👶🍭💅🕔🍳💗🍞💚🍩🐷🍘🌄🐇 🐼👀👀🎅 📄🎁🐞💼 placerat erat 💧🌉💻🔣🐥📠🍪📻🐃🔻👠🕘🍞🍈🍔📴 dolor 📁💹🎐🍂🍉 neque, 👗🍁🍪🌵🕛🍄 🕝👔🔼🎡🔺📬 sed 💨💧🏢🐼👘🌳🎼👹🐉🕕 enim 🔏🏁👊💀🔓 🏫🐈💀🌳🔒👵🔘🎺💴🍑 👭🔲🍰💿🔪🌊 🌅🕔🎑📛🔶🔘🎑🍯 🐲📺👬📊🍒 elit, 🌽🎱💇🎥 ultricies 📡💲🐦🌁🍚🌵🍵 🎮📧🔟🍴👕🔏🌴🔊👳🗼💒🌴👍📞🔳👜🔥🐝🐾🐧🍊 🌰🗾🌂🐄 🔛🐀🍏🍞🔔📖💉📼🐥🍱 🐜🎺🍵💿👂🌋🌸 🔞🐤🔟🍀📙🏩👑 porta id 🎄💺🍙👶👪🔪👪🐭💚📗🎅🍐👡🎭 🐦🎌🐆💫 quis nulla dictumst non 📫🍵🔫🕠 🔯🔃🐷📖💙👓💍 🎬🕃🐥🏫🍛🌆🐗👅📷 📚🍭🌞🌎🕐👜👳 🏧💬💴🎆🐄🔠📄🐰📝🌈 🎩🌇🌟📙 suspendisse 🔡👫🍐🏩🌉🕚📘🐮 👐🏁👮🎭🏣 👰📤🍙🏈👓🐰🍦 lacinia diam eu vestibulum donec faucibus 🔝🎴🌷💩🍡🍜💙 🎐🔉🌛📠🏢📄🍆🎆.
Copy link
Member Author

@Princesseuh Princesseuh Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this file in my editor, and well, performance was disastrous, but it's always been, even before this PR, so, it's not too bad (also, it was always mapped incorrectly, before this PR)

Copy link
Member

@bluwy bluwy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks good to me 👍

internal/sourcemap/sourcemap.go Show resolved Hide resolved
@Princesseuh Princesseuh merged commit f05a7cc into main Aug 6, 2024
5 checks passed
@Princesseuh Princesseuh deleted the fix/utf16-indexes branch August 6, 2024 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐛 BUG: Astro emmet completions broken with 2.12.8 update, does not offer correct auto complete
2 participants