Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Uri] Uri.IsWellFormedUriString() returns false for a URL which is correct #21626

Closed
flo8 opened this issue May 11, 2017 · 24 comments
Closed

[Uri] Uri.IsWellFormedUriString() returns false for a URL which is correct #21626

flo8 opened this issue May 11, 2017 · 24 comments
Assignees
Milestone

Comments

@flo8
Copy link

flo8 commented May 11, 2017

I have a C# (.Net Core 1.1) app that needs to check if a URL is valid. I used the Uri.IsWellFormedUriString() which works pretty well but have a doubt about this one below which returns false. It seems to me that the URL is perfectly valid?

Uri.IsWellFormedUriString("http://www.test.com/search/Le+Venezuela+b%C3%A9n%C3%A9ficie+d%27importantes+ressources+naturelles+%3A+p%C3%A9trole%2C+gaz%2C+mines", UriKind.Absolute)

I used the very same URL with the PHP function below which says the URL is correctly formatted:

function filter_var($url, FILTER_VALIDATE_URL)

If I refer to the RFC3986 it seems this URL is correct. Am I missing something here?

@davidsh
Copy link
Contributor

davidsh commented May 11, 2017

Do you know what the behavior is on .NET Framework? In general, .NET Core behaves like .NET Framework.

@flo8
Copy link
Author

flo8 commented May 11, 2017

@davidsh Indeed, I just checked and get the same behavior or .Net 4.5.2.

However this doesn't explain why this function returns false for this URL?

@davidsh
Copy link
Contributor

davidsh commented May 11, 2017

Thx for confirming .NET Framework behavior. This will have to be investigated to see why this is returning false.

@mazong1123
Copy link
Contributor

I just found whenever a '%' appears in a url, Uri.IsWellFormedUriString() will return false. Hope this can be a starting point to investigate this issue.

@svick
Copy link
Contributor

svick commented May 11, 2017

This still reproduces if you shorten the URI to "http://www.test.com/%C3%A9%2C".

As far as I can tell, this happens because the URI contains both the character "é" (%C3%A9) and encoded comma (%2C), which causes the internal _flags to contain E_PathNotCanonical, but not PathIriCanonical, which in turn means false is returned here.

If you don't encode the comma (i.e. "http://www.test.com/%C3%A9," and "http://www.test.com/search/Le+Venezuela+b%C3%A9n%C3%A9ficie+d%27importantes+ressources+naturelles+%3A+p%C3%A9trole,+gaz,+mines"), then it returns true.

I have no idea if this behavior is correct.

@m7md7sien
Copy link

m7md7sien commented Mar 1, 2018

As @svick said , I managed to overcome this issue by decoding the url.

string decodedUrl = HttpUtility.UrlEncode(url);
Uri.IsWellFormedUriString(decodedUrl, UriKind.RelativeOrAbsolute);

@rmkerr rmkerr changed the title Uri.IsWellFormedUriString() returns false for a URL which is correct [Uri] Uri.IsWellFormedUriString() returns false for a URL which is correct May 2, 2018
@rmkerr
Copy link
Contributor

rmkerr commented May 15, 2018

My best guess here is that somewhere in the code we are checking the string for encoded non-reserved characters, and that check incorrectly considers commas to be unreserved.

This should be a fairly simple issue to address for someone that wants to learn more about URI, so I'll mark this as up for grabs. If it lasts too long without getting picked up, I'll go ahead and fix it.

@hades200082
Copy link

I'm seeing this behaviour in a .NetFramework 4.5 project also.

@karelz
Copy link
Member

karelz commented Jun 13, 2018

@hades200082 we are not tracking .NET Framework bugs in CoreFX repo.
Just to set expectations: The bar for .NET Framework fixes is high to preserve compatibility. If there are multiple customers hitting it badly and there is no reasonable workaround and the fix is low-risk (sadly, any Uri changes tend to introduce new regressions quite often), it may have a chance to get fixed in future .NET Framework. Let us know if that is the case.

@nicholasb90
Copy link

nicholasb90 commented Apr 4, 2019

In .net core 2.1 I am also encountering what looks to be the same bug, or a very similar bug.

var uri = @"https://maps.googleapis.com/maps/api/geocode/json?address=%2C%2CMontr%C3%A9al%2CQuebec%2CCanada&sensor=false";
            
Uri.IsWellFormedUriString(uri, UriKind.Absolute); //returns false, however above URI is valid.

However, if I leave the URI unencoded it passes the IsWellFormedUriString check:

var uri = @"https://maps.googleapis.com/maps/api/geocode/json?address=,,Montréal,Quebec,Canada&sensor=false";
            
Uri.IsWellFormedUriString(uri, UriKind.Absolute); //returns true

@karelz
Copy link
Member

karelz commented Apr 5, 2019

@nicholasb90 can you please create minimal repro? (as in "shortest problematic Uri possible")
If everyone does that, it will be much easier to judge what is duplicate of what ...

@karelz
Copy link
Member

karelz commented Apr 5, 2019

cc @wtgodbe

@nicholasb90
Copy link

nicholasb90 commented Apr 8, 2019

@karelz

I suspect the issue is related to combining encoded characters that require one encode value and characters that require multiple encode values. For example, 学 encodes to %E5%AD%A6 while [ encodes to %5B.

Here are some examples:

public class UriTests
    {
        [Fact] // Fails
        public void IsWellFormedUriString_ReturnTrue_GivenEncodedQueryStringWithCommaAndAccentCharacter()
        {
            var uri = @"http://g.c/j?a=%2C%C3%A9"; //encoded characters in query: ,é

            Assert.True(Uri.IsWellFormedUriString(uri, UriKind.Absolute));
        }

        [Fact] // Passes
        public void IsWellFormedUriString_ReturnTrue_GivenEncodedQueryStringWithComma()
        {
            var uri = @"http://g.c/j?a=%2C"; //encoded characters in query: ,

            Assert.True(Uri.IsWellFormedUriString(uri, UriKind.Absolute));
        }

        [Fact] // Passes
        public void IsWellFormedUriString_ReturnTrue_GivenEncodedQueryStringWithAccentCharacter()
        {
            var uri = @"http://g.c/j?a=%C3%A9"; //encoded characters in query: é

            Assert.True(Uri.IsWellFormedUriString(uri, UriKind.Absolute));
        }

        [Fact] // Fails
        public void IsWellFormedUriString_ReturnTrue_GivenEncodedQueryStringWithOpenBracketAndDoubleByteCharacter()
        {
            var uri = @"http://g.c/j?a=%E5%AD%A6%5B"; //encoded characters in query: 学[

            Assert.True(Uri.IsWellFormedUriString(uri, UriKind.Absolute));
        }

        [Fact] // Passes
        public void IsWellFormedUriString_ReturnTrue_GivenEncodedQueryStringWithOpenBracket()
        {
            var uri = @"http://g.c/j?a=%5B"; //encoded characters in query: [

            Assert.True(Uri.IsWellFormedUriString(uri, UriKind.Absolute));
        }

        [Fact] // Passes
        public void IsWellFormedUriString_ReturnTrue_GivenEncodedQueryStringWithDoubleByteCharacter()
        {
            var uri = @"http://g.c/j?a=%E5%AD%A6"; //encoded characters in query: 学

            Assert.True(Uri.IsWellFormedUriString(uri, UriKind.Absolute));
        }

    }

@RubenZim
Copy link

RubenZim commented Apr 17, 2019

Try enabling IDN and IRI-Parsing in your App.config by adding this to your configuration section to ensure correct handling for international character set:

<uri>
<idn enabled="All"/>
<iriParsing enabled="true"/>
</uri>

Afer doing this, you should create a decoded version of your URL like this to avoid complications between encoded and decoded URLs:

string decodedURL = HttpUtility.UrlDecode(yourURLString);

Now you can check like this:

if (Uri.IsWellFormedUriString(yourURLString, UriKind.Absolute) || Uri.IsWellFormedUriString(decodedURL , UriKind.Absolute))

Maybe this is not a perfect solution, but the closest one for me to get this working as reliable as possible.

Btw. I'm using .Net Framework 4.5.2, but I guess it should also work with lower versions.

@tmenier
Copy link

tmenier commented Sep 7, 2019

I just ran into this one. You probably have plenty of examples, but just to further confirm @nicholasb90 's hypothesis:

Assert.True(Uri.IsWellFormedUriString("http://myhost.com/%26", UriKind.Absolute)); // pass
Assert.True(Uri.IsWellFormedUriString("http://myhost.com/%C3%A9", UriKind.Absolute)); //pass
Assert.True(Uri.IsWellFormedUriString("http://myhost.com/%26%C3%A9", UriKind.Absolute)); //fail

Is this a recommended work-around, i.e. using Uri.UnescapeDataString on the string before testing it? It makes my example pass but not sure if there are pitfalls.

@karelz
Copy link
Member

karelz commented Oct 2, 2019

Triage: This will be breaking change - we will have to document it at minimum.

@FaizulHussain
Copy link

I've had similar issues. In my case the method IsWellFormedUriString failed if it contained %2D instead of hyphen character (-)

@karelz
Copy link
Member

karelz commented Nov 1, 2019

@FaizulHussain can you please update your reply with the code (e.g. like in https://github.com/dotnet/corefx/issues/19630#issuecomment-529069574)? It will be harder to miss in future.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@julian94
Copy link

I just ran into this issue at work and can add that using non-ascii characters like Å or ตั together with any of the RFC 3986 section 2.2 Reserved Characters fails, ! * ' ( ) ; : @ & = + $ , / ? # [ ].
E.g. Å*

SubZero0 pushed a commit to discord-net/Discord.Net that referenced this issue Apr 28, 2021
`Uri.IsWellFormedUriString()` doesn't return the expected result for specific urls, removed until the DotNet team actually resolves it ( dotnet/runtime#21626 )
@andre-ss6
Copy link

We also just encountered this issue.

And I just noticed this is up-for-grabs. A friend and I could be interested in doing this. @karelz everything good for a PR? And on the documentation part, whose responsibility would be that? The pr author, or you guys? I imagine the latter?

@andre-ss6
Copy link

For anyone that also encountered this issue on .Net Core, this may be of help: we tried every solution proposed out there (including the ones mentioned in this issue) and none worked for us.

We thought of calling other language's libraries via interop, using regex, and other things, but ideally we wanted to stick to the .net implementation.

Thus, we came up with a workaround for the time being (while we don't fix the underlying bug):

  • Test if the string is a well formed relative URI (since the bug appears to only apply to absolute URIs)
  • If it's not, test if it's a well formed absolute URI
  • If it's also not, then then we Uri.TryParse the string, and check:
    • If the string was already URI encoded (for example, by decoding it and checking if it returns the same string)
    • If the scheme + user info + host forms a well formed absolute URI (the bug might still manifest itself here)
    • If the path + query + fragment forms a well formed relative URI (relative URIs are not affected by the bug)

This won't work for cases where the offending character combinations are present in the user info or host (such as http://té%40t.com/ or http://té%40t@domain.com/); However, for us at least, having this work for the path+query+fragment part of the string was already enough.

@karelz
Copy link
Member

karelz commented Nov 5, 2021

It has total 13 customer reports (1 was offline) -- only 2 are upvotes of the top post (1 is the original post).

Can I ask everyone who've hit it to please upvote the top post? It will help us prioritize.

Moving it to .NET 7.0 as it has rather larger impact. cc @MihaZupan

@marstaik
Copy link

This URI is also failing on .NET 6.0

https://storage.googleapis.com/kocmoc/audio-archive/2021_10/2021_10_13%20-%20The%20Last%20Dive%20with%20%CE%91%CE%A7%CE%99%CE%9D%CE%9F%CE%99.mp3

@karelz .NET 6.0 was just released... having to wait for .NET 7 for a critical bug that should have been fixed 4 years ago is a bit ridiculous.

@MihaZupan MihaZupan mentioned this issue Jan 25, 2022
AntiGuideAkquinet added a commit to AkquinetRistec/odata.net that referenced this issue Apr 13, 2022
When initializing a DataServiceContext there was a mandatory URI check
relying on .NETs Uri.IsWellFormedUriString. Sadly there is an active
issue on the .NET side that incorrectly flags URIs like
"http://192.168.0.1:1234/Instance/ODataV4/Company('123-Customer Place
Süd-Ost')/" as being invalid. The progress is tracked here:
dotnet/runtime#21626
To give the user a simple option to prevent this issue from hindering
further development with the library I added a simple bool to ignore
this error.
julian94 pushed a commit to julian94/runtime that referenced this issue Jun 2, 2022
Uri.IsWellFormedUri() reports a false negative when mixing characters
like Å or ตั together with any of the RFC 3986 section 2.2 Reserved
Characters, ! * ' ( ) ; : @ & = + $ , / ? # [ ].

This change adds a (failing) unit test for this bug.

Tests dotnet#21626
@MihaZupan MihaZupan removed the help wanted [up-for-grabs] Good issue for external contributors label Jul 20, 2022
@MihaZupan
Copy link
Member

Closing in favour of #72632

@ghost ghost locked as resolved and limited conversation to collaborators Aug 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests