Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperlinks with the same ID and different URIs collide when they shouldn't #7698

Closed
PhMajerus opened this issue Sep 22, 2020 · 30 comments · Fixed by #7940
Closed

Hyperlinks with the same ID and different URIs collide when they shouldn't #7698

PhMajerus opened this issue Sep 22, 2020 · 30 comments · Fixed by #7940
Assignees
Labels
Area-VT Virtual Terminal sequence support Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-2 A description (P2) Product-Terminal The new Windows Terminal. Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release.

Comments

@PhMajerus
Copy link

PhMajerus commented Sep 22, 2020

In Windows Terminal Preview 1.4.2652.0.
Hyperlinks support is a great addition, but from my understanding there is a bug with how links IDs are parsed and handled.

Character cells that have the same target URI and the same nonempty id are always underlined together on mouseover.
The same id is only used for connecting character cells whose URIs is also the same. Character cells pointing to different URIs should never be underlined together when hovering over.
(Source: https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda#hover-underlining-and-the-id-parameter )

But testing the same ID with different URIs shows Windows Terminal is grouping them together and using the URI of the last one for both :
echo -e '\e]8;id=a;http://www.microsoft.com/\aMicrosoft\e]8;;\a \e]8;id=a;http://www.google.com/\aGoogle\e]8;;\a'

image

Even different IDs are grouped together if they are not numbers:
echo -e '\e]8;id=a;http://www.microsoft.com/\aMicrosoft\e]8;;\a \e]8;id=b;http://www.google.com/\aGoogle\e]8;;\a'

While the page about hyperlinks explicitely shows and states that these IDs are strings, not numbers:

params is an optional list of key=value assignments, separated by the : character. Example: id=xyz123:foo=bar:baz=quux.

Complex apps that display data that might itself contain OSC 8 hyperlinks (such as terminal multiplexers, less -R) should do the following: If the encountered OSC 8 hyperlink already has an id, they should prefix it with some static string, or if multiple windows/panes are supported by the app, a prefix that's unique to that window/pane to prevent conflict with other windows/panes.

@ghost ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Sep 22, 2020
@DHowett
Copy link
Member

DHowett commented Sep 22, 2020

So, a (uri 1) a (uri 2) is operating as designed. The reuse of an ID with a new URI is utterly intended to replace the old ID.

I can't reproduce your second issue, a (uri 1) b (uri 2):

image

Complex apps ...

Since Terminal's "terminals" are individual entities with their own hyperlink stores and user interactivity, we don't fall into the "display data that might itself contain OSC 8 hyperlinks" case and are therefore not in need of namespacing. Two hyperlinks in two different panes are automatically in different isolated roots:

image

note only the link in the right pane is highlighted

@DHowett
Copy link
Member

DHowett commented Sep 22, 2020

The same id is only used for connecting character cells whose URIs is also the same.

Ah. HM.

@DHowett
Copy link
Member

DHowett commented Sep 22, 2020

Why would there be an ID parameter if it cannot be used to disambiguate URIs? Why would you reuse the same ID? I've got so many questions.

@PhMajerus
Copy link
Author

PhMajerus commented Sep 22, 2020

@DHowett Yeah, the one with different ids a and b and different URIs seems to be processed as separate, not sure what I did when testing, sorry.

Why would there be an ID parameter if it cannot be used to disambiguate URIs? Why would you reuse the same ID? I've got so many questions.

That one I can answer!
Imagine a CUI utility that shows several files and wants to hyperlink them, you don't want to generate lots of IDs that could collide with IDs that were already output by other CUI utilities.
So that utility could generate a single random ID for its complete output, but have different URIs for each file, instead of having to generate as many IDs as it has to show hyperlinks that are guaranteed to have different URIs.
If the ID generation is based on milliseconds, it's better not to generate a lot of them within a single utility.

This is exactly what I'm doing in one of my utility to have hyperlinks spanning several lines of a grid:
image

They all share a single ID, but URIs are differentiating them.

@DHowett
Copy link
Member

DHowett commented Sep 22, 2020

interesting! I'd've never expected that that might rely on {id, uri} tuples that differed only on the uri

@PhMajerus
Copy link
Author

Yeah, I'm basically using a link ID generator function and try to keep the same ID for a single output.
Calling that ID generator on each line would be uneccessary and pollute the IDs hash table for no good reason.
image

@DHowett
Copy link
Member

DHowett commented Sep 22, 2020

I think that's just hiding the pollution somewhere you don't have control over ;P because somewhere the terminal emulator needs to store a list of IDs and how they map to URIs, and unique across different URIs, which I expect inevitably pollutes somebody's ID hash table... right?

My quick fix is...

diff --git a/src/buffer/out/textBuffer.cpp b/src/buffer/out/textBuffer.cpp
index 294813a3f..cac3718f4 100644
--- a/src/buffer/out/textBuffer.cpp
+++ b/src/buffer/out/textBuffer.cpp
@@ -2279,32 +2279,35 @@ std::wstring TextBuffer::GetHyperlinkUriFromId(uint16_t id) const
     else
     {
         // assign _currentHyperlinkId if the custom id does not already exist
-        const auto result = _hyperlinkCustomIdMap.emplace(params, _currentHyperlinkId);
+        std::wstring newId{ id };^M
+        // hash the URL and add it to the custom ID - GH#7698^M
+        newId += L"%" + std::to_wstring(std::hash<std::wstring_view>()(uri));^M
+        const auto result = _hyperlinkCustomIdMap.emplace(newId, _currentHyperlinkId);^M

which is pretty much that ;P

@DHowett
Copy link
Member

DHowett commented Sep 22, 2020

sidebar

You're probably more impacted by the fact that we reject file URIs, right? We were having a long discussion over on #7526 about it, and never came to anywhere we felt comfortable with. The same discussion is playing out in #7668, because of how hard it is for the Terminal alone to map paths...

@DHowett DHowett changed the title Hyperlinks IDs handling error Hyperlinks with the same ID and different URIs collide when they shouldn't Sep 22, 2020
@DHowett DHowett added Area-VT Virtual Terminal sequence support Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-2 A description (P2) Product-Terminal The new Windows Terminal. and removed Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting labels Sep 22, 2020
@ghost ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Sep 22, 2020
@DHowett DHowett added this to the Terminal Backlog milestone Sep 22, 2020
@PankajBhojwani
Copy link
Contributor

Thank you for pointing this out to us!

Imagine a CUI utility that shows several files and wants to hyperlink them, you don't want to generate lots of IDs that could collide with IDs that were already output by other CUI utilities.
So that utility could generate a single random ID for its complete output, but have different URIs for each file, instead of having to generate as many IDs as it has to show hyperlinks that are guaranteed to have different URIs.
If the ID generation is based on milliseconds, it's better not to generate a lot of them within a single utility.

Yeah, I'm basically using a link ID generator function and try to keep the same ID for a single output.
Calling that ID generator on each line would be uneccessary and pollute the IDs hash table for no good reason.

Question: why generate an ID at all? You could just send an OSC 8 sequence with no id parameter and achieve the desired functionality.

If we differentiate links by both id and uri, then we lose the functionality of being able to overwrite previous link URIs by providing the same ID - which I think could potentially be useful in some cases (?)

@PhMajerus
Copy link
Author

Yeah, you'll have to store the whole {id,uri} tuple as the unique ID, but I believe it makes sense as since they define what's essentially part of the same link, they cannot have different targets, and updating the target with the latest one received would be a weird behavior and open the door to apps that could end up updating the target right between the time you checked the URI in the tooltip and when you cliked to open it.

@PhMajerus
Copy link
Author

PhMajerus commented Sep 22, 2020

Oh, and the following wouldn't work:

Complex apps that display data that might itself contain OSC 8 hyperlinks (such as terminal multiplexers, less -R) should do the following: If the encountered OSC 8 hyperlink already has an id, they should prefix it with some static string, or if multiple windows/panes are supported by the app, a prefix that's unique to that window/pane to prevent conflict with other windows/panes.

An app like tmux would receive output from utilities that do not generate any ID and only rely on the different URIs, but if the link had to be wrapped on two lines within a pane, tmux would then have to give them ids to bind them together, but they'll end up all having the same id by prefixing these empty IDs with its own pane ID.

@PankajBhojwani
Copy link
Contributor

Right that makes sense, thanks for the explanation! Seems like we'll probably go with @DHowett's fix since it essentially uses the {id,uri} tuple as an ID.

@DHowett
Copy link
Member

DHowett commented Sep 22, 2020

I think the operant part of that sentence though is "already has an ID" -- in the no-ID case, tmux would pass it through without an ID, and there wouldn't be any chance of collision since it's uniqued by the TE.

@PhMajerus
Copy link
Author

PhMajerus commented Sep 22, 2020

Imagine one that has
+--------+--------+
| pane 1 | pane 2 |
+--------+--------+
And multiplexer has to show the some output that has to be wrapped:
+--------+---------+
| pane 1 | hyperli |
| - -- - | nk1 - - |
| - -- - | hyperli |
| - -- - | nk2 - - |
+--------+---------+
To group these by link, it has to assign them ids relative to the pane they appear in, but original utility didn't have to since it was just outputting them for a single screen.

@DHowett
Copy link
Member

DHowett commented Sep 22, 2020

This may be hypothetical -- I can't get tmux to actually produce a link. I think if it can track a region of text that it knows to be a hyperlink, it's going to have to assign an ID. That doesn't mean it's gotta use the same ID for every hyperlink (?)

@DHowett
Copy link
Member

DHowett commented Sep 22, 2020

The specification goes on to suggest how tmux might handle this -- assigning a new mangled ID every time an un-ID'd hyperlink is encountered. That's what conhost/conpty does, as well.

It seems like dropping IDs is safer for uniqueness (even across sessions of AXSH) than a static ID if it's not ever intending regions of text to highlight together! /shrug

@DHowett
Copy link
Member

DHowett commented Sep 22, 2020

(That doesn't mean I'm not going to fix this bug! It's definitely wrong, but we're in the weeds of talking about how your application can recover while we're wrong.)

@PhMajerus
Copy link
Author

This may be hypothetical -- I can't get tmux to actually produce a link. I think if it can track a region of text that it knows to be a hyperlink, it's going to have to assign an ID. That doesn't mean it's gotta use the same ID for every hyperlink (?)

True, but generating an ID prefix that simply encodes the pane ID saves it from having to keep track of all the links encountered, it can just keep track of its panes and how to prefix links based on panes.

@DHowett
Copy link
Member

DHowett commented Sep 22, 2020

But then that violates another quality of the spec: that two non-ID'd hyperlinks pointing to the same URI do not underline together.

Admittedly info on the ground here is thin, but regarding adjacent links with no ID but the same URI:

Here VTE and iTerm2 differ, but from a practical point of view, this difference should not matter.

@PhMajerus
Copy link
Author

PhMajerus commented Sep 22, 2020

Question: why generate an ID at all? You could just send an OSC 8 sequence with no id parameter and achieve the desired functionality.

Because in that grid() function, the whole tile of each ANSI-art icon + label acts as a single hyperlink, so these links span several lines and are entertwined with other grid cells on each line. Not providing IDs would work when you click, but wouldn't have the hover highlight on the whole tile on mouse over.

If we differentiate links by both id and uri, then we lose the functionality of being able to overwrite previous link URIs by providing the same ID - which I think could potentially be useful in some cases (?)

I believe that would be against specs, incompatible behavior with other terminals and risky as new output could modify a previous link, including some link shown by another utility in the same scrollback buffer if ID collision occurs between utilities in the same shell.
Using {id,uri} as the unique ID means you might have the highlight effect spanning several outputs if ids and uris collide, but at least it's only a visual glitch, you can't have an utility take over another utility's output targets.

@PhMajerus
Copy link
Author

PhMajerus commented Sep 22, 2020

It seems like dropping IDs is safer for uniqueness (even across sessions of AXSH) than a static ID if it's not ever intending regions of text to highlight together! /shrug

True, I don't generate ids if a link is on a single line, only if they span several lines.
For example, my dir() function doesn't generate any id since each item appear on a single line
image

The only issue with that one is that the file:// URI isn't supported according to Windows Terminal while Windows 10 does have a protocol handler for file: URIs:
image

@DHowett
Copy link
Member

DHowett commented Sep 22, 2020

Ah, the icon. That's the part I was missing.

Isn't it annoying that they're covered in dashed lines? 😁

@DHowett
Copy link
Member

DHowett commented Sep 22, 2020

Yeah, check the sidebar I linked above 😉

@PhMajerus
Copy link
Author

PhMajerus commented Sep 22, 2020

Isn't it annoying that they're covered in dashed lines? 😁

Yeah, it is, other terminals I've tried only underline on hover which was better looking but less discoverable.

The docs page did hint that underline would be on hover only:

There's a nontrivial question though: Which cells to underline on hover? As opposed to webpages, we lack the semantics, the information about the cells that belong together and form a single web anchor.

But I respect your artistic decision here... as long as you give us a switch in the profile to turn it off 😋

@PhMajerus
Copy link
Author

Yeah, check the sidebar I linked above 😉

Ah, yeah, that's bad news, I had hopes to let users just click a file from a directory/list utility if they wanted to just open the file on their desktop.
Guess I'll have to just write a protocol handler with some random name that delegates the job to the original file: handler class to work around it, making your security moot and links less clear to users 😆

@PhMajerus
Copy link
Author

PhMajerus commented Sep 22, 2020

Since the Terminal cannot guess which LX instance some output is coming from, how about...

You intercept outputs at the LX subsystem level and replace any link with "file://*" protocol by something like "wslfile://[lx_instance_id]/[original_linux_path]" before letting it cross to the Win32 side.
Then, you write a Win32 protocol handler registered for wslfile URLs that connects to the lx_instance_id and launches the file original_linux_path within the original instance.

Or if you don't want to write a protocol handler to integrate it with apps on the Linux side, maybe you could just replace any file://* with file://wsl$/instance/* to let Windows apps access these links ? (I'd find the protocol handler more elegant)

Of if you don't want to change the links on the fly at the LX subsystem level, you could intercept file: URIs in Terminal as you're doing today, but allow each profile to specify a replacement protocol handler. Win32 shells could just reference the existing system file protocol handler to get their links working, while WSL shells could use the wslfile handler. WT would then swap the protocol part and pass it along to the requested handler.

@ghost ghost added the In-PR This issue has a related PR label Oct 16, 2020
@DHowett
Copy link
Member

DHowett commented Oct 16, 2020

I finally got off my behind and put that patch into PR.

@DHowett DHowett self-assigned this Oct 16, 2020
@PhMajerus
Copy link
Author

Thank you! 😊

DHowett added a commit that referenced this issue Oct 16, 2020
It turns out that we missed part of the OSC 8 spec which indicated that
_hyperlinks with the same ID but different URIs are logically distinct._

> Character cells that have the same target URI and the same nonempty id
> are always underlined together on mouseover.
> The same id is only used for connecting character cells whose URIs is
> also the same. Character cells pointing to different URIs should never
> be underlined together when hovering over.

This pull request fixes that oversight by appending the (hashed) URI to
the generated ID.

When Terminal receives one of these links over ConPTY, it will hash the
URL a second time and therefore append a second hashed ID. This is taken
as an acceptable cost.

Fixes #7698
@ghost ghost closed this as completed in #7940 Oct 16, 2020
@ghost ghost added Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release. and removed In-PR This issue has a related PR labels Oct 16, 2020
ghost pushed a commit that referenced this issue Oct 16, 2020
It turns out that we missed part of the OSC 8 spec which indicated that
_hyperlinks with the same ID but different URIs are logically distinct._

> Character cells that have the same target URI and the same nonempty id
> are always underlined together on mouseover.
> The same id is only used for connecting character cells whose URIs is
> also the same. Character cells pointing to different URIs should never
> be underlined together when hovering over.

This pull request fixes that oversight by appending the (hashed) URI to
the generated ID.

When Terminal receives one of these links over ConPTY, it will hash the
URL a second time and therefore append a second hashed ID. This is taken
as an acceptable cost.

Fixes #7698
DHowett added a commit that referenced this issue Oct 19, 2020
It turns out that we missed part of the OSC 8 spec which indicated that
_hyperlinks with the same ID but different URIs are logically distinct._

> Character cells that have the same target URI and the same nonempty id
> are always underlined together on mouseover.
> The same id is only used for connecting character cells whose URIs is
> also the same. Character cells pointing to different URIs should never
> be underlined together when hovering over.

This pull request fixes that oversight by appending the (hashed) URI to
the generated ID.

When Terminal receives one of these links over ConPTY, it will hash the
URL a second time and therefore append a second hashed ID. This is taken
as an acceptable cost.

Fixes #7698

(cherry picked from commit df7c3cc)
@ghost
Copy link

ghost commented Nov 11, 2020

🎉This issue was addressed in #7940, which has now been successfully released as Windows Terminal v1.4.3141.0.:tada:

Handy links:

@ghost
Copy link

ghost commented Nov 11, 2020

🎉This issue was addressed in #7940, which has now been successfully released as Windows Terminal Preview v1.5.3142.0.:tada:

Handy links:

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-VT Virtual Terminal sequence support Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-2 A description (P2) Product-Terminal The new Windows Terminal. Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants