Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Echo Does Not Correctly Handle URLs with a RawPath #1974

Open
3 tasks done
cedricvanrompay opened this issue Aug 26, 2021 · 9 comments
Open
3 tasks done

Echo Does Not Correctly Handle URLs with a RawPath #1974

cedricvanrompay opened this issue Aug 26, 2021 · 9 comments

Comments

@cedricvanrompay
Copy link

Issue Description

Abstract: func (r *Router) Find(method, path string, c Context) (in file router.go) is sometimes given a raw path instead of a unescaped path, but because it has no way to tell between the two situations it does not behave properly if given a raw path.

Argument path of func (r *Router) Find is provided by func GetPath(r *http.Request) string (in file echo.go) which returns the Path of the request's URL, unless the URL structure has a RawPath in which case it returns the RawPath. A URL structure from package net/url has a RawPath if escape(unescape(RawPath)) != RawPath (where escape applies URL percent-encoding). See net/url source code. The rationale of this extra RawPath field in net/url package is to handle the case of URLs containing %2F, which unescape will transform into character / but which should not be used as a path separator by the server.

What a server should do when it receives an URL which has a RawPath is:

  1. use the RawPath to split the path into segments based on the ”true” / characters (the ones not coming from unescaping %2F)
  2. then, unescape every segment

Because Echo's func (r *Router) Find does not when the path it received is a RawPath, it fails to do step (2), and this causes bugs.

Checklist

  • Dependencies installed
  • No typos
  • Searched existing issues and docs

Expected behaviour vs. Actual behaviour (+ steps to reproduce)

If a JavaScript application sends a request with a path containing and single quote and a space, such as fetch("/foo/bar/what's up"), Echo will return a HTTP 404 Not Found even if there was a handler for "/foo/bar/what's up" or "/foo/bar/:msg". The reasons are that:

  • JavaScript does not escape the single quote, in conformance with RFC 2396 §2.3 (note that this RFC has now been obsoleted by RFC 3986 of which paragraph 2.2 seems to say single quotes should be escaped, but, well, JS doesn't do it).
  • Go's net/url does escape the single quote (see net/url: URL.String() hashbang "#!" encodes as "#%21" golang/go#19917 and golang/go@8a330454)
  • as a result, escape(unescape(RawPath)) ends with what%27s%20up while RawPath ends with what's%20up, so the two don't match and URL's RawPath is set.
  • func GetPath(r *http.Request) string will then pass the raw URL to Find, ending with what's%20up, and since Find never does any unescaping, weird behaviors will appear: a handler for "/foo/bar/what's up" will not match, or path parameter msg will be set to what's%20up instead of what's up.

Of course, this situation can be fixed easily by modifying the JS application to encode the single quote. But strictly speaking it's not the fault of the JS app: it is legal not to escape the single quote, and the server (that is, Echo) should work with unescaped path fragments even if it had to do path splitting on a raw path.

Suggested Solutions

One way would be to change Find's argument path string to pathSegments []string, where the caller of Find would be responsible for splitting the path into segments. This way if the URL has a RawPath the caller can use it to do the splitting, then unescape all parts before sending them to Find.

The problem is that Find currently processes path byte-by-byte, and it would probably require a complete rewrite of Find (and potentially other parts of Echo, such as the logic adding a handler to a router) to operate on path segments.

I don't see any solution that would correctly handle URLs with a RawPath and would not require a big rewrite. A few mitigations I can suggest:

  • add an option to skip func GetPath(r *http.Request) string and always use the URL's Path instead

  • Give Find a way to tell when it got a raw path (raw bool argument?) and then replacing paramValues[paramIndex] = search[:i] with:

    if raw {
        paramValues[paramIndex] = PathUnescape(search[:i])
    } else {
        paramValues[paramIndex] = search[:i]
    }

    This will at least fix the situation for path parameters. It will still break if path segments contains percent-encoded characters.

  • add a warning about this known issue somewhere visible

Version/commit

7f502b1

@aldas
Copy link
Contributor

aldas commented Aug 26, 2021

What does that fetch("/foo/bar/what's up") actually sends as path?

I'm trying that same case with curl and this request will not get even past http server code.

func main() {
	e := echo.New()

	e.GET("/foo/bar/:msg", func(c echo.Context) error {
		return c.String(http.StatusOK, c.Param("msg"))
	})

	if err := e.Start(":8088"); err != http.ErrServerClosed {
		log.Fatal(err)
	}
}
x@x:~/code$ curl -v "http://localhost:8088/foo/bar/what's up"
*   Trying 127.0.0.1:8088...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8088 (#0)
> GET /foo/bar/what's up HTTP/1.1
> Host: localhost:8088
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 400 Bad Request
< Content-Type: text/plain; charset=utf-8
< Connection: close
< 
* Closing connection 0
400 Bad Requesttoim

net-http-request_go-line-1047

https://github.com/golang/go/blob/770df2e18df01e64f8770301b0d3a5d6bfa04027/src/net/http/request.go#L1049

@cedricvanrompay
Copy link
Author

cedricvanrompay commented Aug 26, 2021

It sends /foo/bar/what's%20up. Here it is in Firefox:

Screenshot from 2021-08-26 17-11-26

It seems that your curl does not take care of percent-encoding at all, hence the bad request.

@aldas
Copy link
Contributor

aldas commented Aug 26, 2021

seems so. But I do not seems to get 404 as you do.

x@x:~/code$ curl -v "http://localhost:8088/foo/bar/what's%20up"
*   Trying 127.0.0.1:8088...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8088 (#0)
> GET /foo/bar/what's%20up HTTP/1.1
> Host: localhost:8088
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: text/plain; charset=UTF-8
< Date: Thu, 26 Aug 2021 15:16:33 GMT
< Content-Length: 11
< 
* Connection #0 to host localhost left intact
what's%20up

Is your problem 404 or path param value being what's%20up ?

This should be cleared as a first thing. Before we move to solutions we should all understand in a super simple example what is needs to happen "when given input X then output should be Y"

There are historical issues where some need escaping and issued where it is not wanted. Some even unescape path params in middleware, some in handlers. For proxy mw users path param escaping/unescaping is imprtant.

p.s. we are planning to introduce flag for router in v5 to unescape path params in router.

p.s.s. sorry if this is said in first post but it is way easier to start with case you can debug than try to backtrack quite large text of solution/related info.

@aldas
Copy link
Contributor

aldas commented Aug 26, 2021

Maybe using URL to build your request fetch(new URL('http://www.example.com/dogs')) would help with exotic paths. It seems to use RFC3986

@cedricvanrompay
Copy link
Author

Here is an example of Go code:

package main

import (
	"net/http"

	"github.com/labstack/echo/v4"
)

func main() {
	// Echo instance
	e := echo.New()

	// Routes
	e.GET("/", hello)
	e.GET("/foo/bar/what's up", hello)
	e.GET("foo/param/:name", handleParam)

	// Start server
	e.Logger.Fatal(e.Start(":1323"))
}

// Handler
func hello(c echo.Context) error {
	return c.String(http.StatusOK, "Hello, World!")
}

func handleParam(c echo.Context) error {
	return c.String(http.StatusOK, c.Param("name"))
}

Then, in Firefox, I navigate to http://localhost:1323/ and in the JS console I do:

fetch("http://localhost:1323/foo/bar/what's up")
fetch("http://localhost:1323/foo/param/what's up")
  • for the first, fetch, I am expecting a HTTP 200 OK with body Hello World!, but I get HTTP 404 Not Found.
  • for the second fetch, I expect a HTTP 200 OK with body what's up but I get a body with what's%20up instead.

Maybe using URL to build your request fetch(new URL('http://www.example.com/dogs')) would help with exotic paths. It seems to use RFC3986

As I say in the description:

Of course, this situation can be fixed easily by modifying the JS application to encode the single quote. But strictly speaking it's not the fault of the JS app: it is legal not to escape the single quote, and the server (that is, Echo) should work with unescaped path fragments even if it had to do path splitting on a raw path

So yes it is easy to adapt the JS app so that the bug does not trigger anymore, but Echo should still be capable of handling requests with unescaped single quotes to be compliant with the URI standard.

@aldas
Copy link
Contributor

aldas commented Aug 26, 2021

Note to self:

404 and escaped path params are 2 different things.

This is one of those hard problems where change (correct or not) could break expectations that people have built around routing and path params. Going through history it seems that using rawPath for routing is feature that Echo has had very long time.

Some related issues:
#587
#561
#766
#839
#878
#947
#1628
#1798

Things to consider when changing:

  • proxy middleware
  • places where files are served: e.Static() e.File() and Static middleware
  • add/remove slashes middleware
  • redirects?
  • if it comes to very hacky solution consider labeling it as a "feature" (unintentional feature with rare negative user-experience) and not a "bug".

Expectations:

  • currently path params and * params are escaped and people have built logic on it. I think there was issue where someone built middleware to unescape params. Which means some want it to be unescaped.

@ikedam
Copy link

ikedam commented Dec 4, 2021

Hi, I'm facing this issue in my application.
I added a new endpoint and its path may contains characters that need escaping.
I found that echo.Context.Param() behaves inconsistently, that is, sometimes returns unescaped value but sometimes returns escaped value. My application cannot know whether the value is encoded or decoded and have no way to handle parameters via path properly.
I hope this will be fixed.

Proposals:

  • Using func (*URL) EscapedPath looks good for the consistent behavior: https://pkg.go.dev/net/url#URL.EscapedPath
  • If we need consider compatibility...
    • How about introduce new methods echo.Context.UnescapedParam() and echo.Context.EscapedParam()?
    • How about adding a configuration to specify the behavior?:
      • Compatible: same as current. echo.Context.Param() sometimes returns unescaped values, sometimes escaped values.
      • Unescape: echo.Context.Param() always returns unescaped values.
      • Escape: echo.Context.Param() always returns escaped values.

@aldas
Copy link
Contributor

aldas commented Dec 4, 2021

Hi @ikedam

How about adding a configuration to specify the behavior?:

This will be introduced in v5. If you look for RouterConfig.unescapePathParamValues in #2000 you will see.

Using func (*URL) EscapedPath looks good for the consistent behavior: https://pkg.go.dev/net/url#URL.EscapedPath

This is where path is decided for routing purposes. Note URL.RawPath is EscapedPath.
And here is that same path value used as basis where path param values are cut out.

Now in serving static files with e.Static() path is unescaped here we cause of file names. Same happens in Static middleware here

It would be very helpful if you could paste short example here with curl command line example (including problematic url) and describe if problem arises in some of your custom middleware or in handler method.

@ikedam
Copy link

ikedam commented Dec 5, 2021

Thanks! I watch that ticket and wait for v5.

So Param() is generally expected to return unescaped values.
Using EscapedPath() would require additional url.PathUnescape() when cutting out path param values, for example here: https://github.com/labstack/echo/blob/v4.6.1/router.go#L510 .

My case is illustrated here: https://go.dev/play/p/Or-RPfI6me5
I check whether echo.Context.Request().URL.RawPath is set to decide unescape is required as a workaround, and it works correct as far as I tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants