Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

probe-engine: web_connectivity 0.5 DNS queries sometimes contains an empty hostname #2628

Closed
hellais opened this issue Nov 14, 2023 · 6 comments · Fixed by ooni/probe-cli#1480

Comments

@hellais
Copy link
Member

hellais commented Nov 14, 2023

In working on adding support for web connectivity 0.5 in ooni/data (ooni/data#39) I noticed that some measurements don't contain the hostname field in the case of failed queries see for example: https://explorer.ooni.org/m/20231101133643.264648_BR_webconnectivity_39f7c5c2cea6cd96

Screenshot 2023-11-14 at 14 53 24

While we should fix it, it would also be good to have some pointers on what you believe should be the right way to handle these kinds of measurements in ooni/data to so we can fix the data to the extent that is possible.

@bassosimone
Copy link
Contributor

The core issue here is that the redirect is for http://, which contains no domain. 🤔

hellais added a commit to ooni/data that referenced this issue Nov 21, 2023
Implements: #42

Changes:
* Some refactoring to the web_analysis to align with new experiment result model
* Refactor feature based analysis into dask
* Rename create table query to WebAnalysis
* Implement workaround for ooni/probe#2628
* Implement probe_cc based load spreading
* Fix start_analysis signature
* Kill some dead code
* Setup system for running tests on web_analysis
* Add checks for ensuring analysis and observations agree on measurement counts
* Refactor and add tests for make_cc_batches
* Add tests for ground truth generation
* A lot of bugfixing and testing improvements
@bassosimone
Copy link
Contributor

We are going to write a test case for that and see what happens when using Web Connectivity v0.4.

@bassosimone
Copy link
Contributor

With ooni/probe-cli#1479, we can now reproduce the issue.

bassosimone added a commit to ooni/probe-cli that referenced this issue Jan 31, 2024
This PR improves webconnectivityqa to reproduce the
ooni/probe#2628 issue.

We model the v0.4 behavior as the correct behavior. We need to extend
webconnectivitylte to correctly handle this case.
@bassosimone
Copy link
Contributor

bassosimone commented Jan 31, 2024

BTW, this issue is much more difficult than I expected so I'm bumping its complexity.

Here's a summary of my understanding so far:

  1. Web Connectivity v0.4 fails when redirecting with unknown_failure: http: no Host in request URL;
  2. Web Connectivity v0.5 fails with dns_nxdomain_error because the redirect URL has an empty domain.

We need to somehow find a way for the following to happen:

  1. the unknown failure should be mapped to a know failure so we don't mark the measurement as failed;
  2. we need to figure out the proper way to report error for v0.5, because obviously it's wrong now.

@bassosimone
Copy link
Contributor

So, let's perhaps see what happens when we're using other programs and we get an http:// redirect.

Here's the test program:

package main

import (
	"net/http"
)

func main() {
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		http.Redirect(w, r, "http://", http.StatusFound)
	})

	http.ListenAndServe(":8080", nil)
}

Here's what happens with cURL:

curl --location -v http://127.0.0.1:8080/
*   Trying 127.0.0.1:8080...
* Connected to 127.0.0.1 (127.0.0.1) port 8080
> GET / HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/8.4.0
> Accept: */*
> 
< HTTP/1.1 302 Found
< Content-Type: text/html; charset=utf-8
< Location: http://
< Date: Wed, 31 Jan 2024 14:11:53 GMT
< Content-Length: 30
< 
* Ignoring the response-body
* Connection #0 to host 127.0.0.1 left intact
* The redirect target URL could not be parsed: No host part in the URL
curl: (3) The redirect target URL could not be parsed: No host part in the URL

It looks similar to how net/http fails.

This is instead what happens when using Firefox:

Screenshot 2024-01-31 at 15 29 03

These tests confirm my initial hypothesis that the following text from RFC 9110 Sect 10.2.2:

The "Location" header field is used in some responses to refer to a specific resource in relation to the response. The type of relationship is defined by the combination of request method and status code semantics.

Location = URI-reference

The field value consists of a single URI-reference. When it has the form of a relative reference ([URI], Section 4.2), the final value is computed by resolving it against the target URI ([URI], Section 5).

and the following text from RFC 3986 Sect 4.2:

A relative reference takes advantage of the hierarchical syntax
(Section 1.2.3) to express a URI reference relative to the name space
of another hierarchical URI.

  relative-ref  = relative-part [ "?" query ] [ "#" fragment ]

  relative-part = "//" authority path-abempty
                / path-absolute
                / path-noscheme
                / path-empty

The URI referred to by a relative reference, also known as the target
URI, is obtained by applying the reference resolution algorithm of
Section 5.

And in turn this other part of RFC 3986 Sect 3.2:

Many URI schemes include a hierarchical element for a naming
authority so that governance of the name space defined by the
remainder of the URI is delegated to that authority (which may, in
turn, delegate it further). The generic syntax provides a common
means for distinguishing an authority based on a registered name or
server address, along with optional port and user information.

The authority component is preceded by a double slash ("//") and is
terminated by the next slash ("/"), question mark ("?"), or number
sign ("#") character, or by the end of the URI.

authority = [ userinfo "@" ] host [ ":" port ]

all seem to suggest that it's unexpected for the host to be empty there.

@bassosimone
Copy link
Contributor

Given all of the above, I conclude that Web Connectivity v0.4 behavior of failing is correct. Now, the only remaining task is to make sure that we make Web Connectivity v0.5 fail in such a way that we can correctly attribute the failure.

bassosimone added a commit to ooni/probe-cli that referenced this issue Jan 31, 2024
This diff ensures that webconnectivitylte is able to handle malformed
redirect URLs such as (literally) `http://` and `https://`.

The way in which we do this is slightly different from v0.4 and possibly
more accurate in that it attributes the error to the operation where we
detect the error rather than later on in the next redirect.

Because of that, I made the QA suite conform to v0.5's behavior.

Closes ooni/probe#2628
bassosimone added a commit to ooni/spec that referenced this issue Jan 31, 2024
…285)

## Checklist

- [x] I have read the [contribution
guidelines](https://github.com/ooni/spec/blob/master/CONTRIBUTING.md)
- [x] reference issue for this pull request:
ooni/probe#2628
- [x] related ooni/probe-cli pull request:
ooni/probe-cli#1480
- [x] If I changed a spec, I also bumped its version number and/or date:
not needed
bassosimone added a commit to ooni/probe-cli that referenced this issue Jan 31, 2024
This diff ensures that webconnectivitylte is able to handle malformed
redirect URLs such as (literally) `http://` and `https://`.

The way in which we do this is slightly different from v0.4 and possibly
more accurate in that it attributes the error to the operation where we
detect the error rather than later on in the next redirect.

Because of that, I made the QA suite conform to v0.5's behavior.

Closes ooni/probe#2628

Related ooni/spec PR: ooni/spec#285
Murphy-OrangeMud pushed a commit to Murphy-OrangeMud/probe-cli that referenced this issue Feb 13, 2024
This PR improves webconnectivityqa to reproduce the
ooni/probe#2628 issue.

We model the v0.4 behavior as the correct behavior. We need to extend
webconnectivitylte to correctly handle this case.
Murphy-OrangeMud pushed a commit to Murphy-OrangeMud/probe-cli that referenced this issue Feb 13, 2024
This diff ensures that webconnectivitylte is able to handle malformed
redirect URLs such as (literally) `http://` and `https://`.

The way in which we do this is slightly different from v0.4 and possibly
more accurate in that it attributes the error to the operation where we
detect the error rather than later on in the next redirect.

Because of that, I made the QA suite conform to v0.5's behavior.

Closes ooni/probe#2628

Related ooni/spec PR: ooni/spec#285
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants