Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set hostname rather than ip when channel Init by hostname but not host in http_requst().uri() #1529

Merged
merged 10 commits into from
Sep 6, 2021

Conversation

guodongxiaren
Copy link
Member

@guodongxiaren guodongxiaren commented Aug 16, 2021

当前brpc的实现中,如果http请求,uri没有设置全地址,比如"/get_result?a=b",那么会默认用ip+port作为HTTP 报头的header字段。
但是这样有问题,即使我的Channel初始化的时候使用的是域名,比如:http://www.foo.com,最后对对端看到的还是ip+端口。但像Apache/Nginx这种Web Server都是只是单地址多服务的,http://www.foo.comwww.bar.com 可能是同一个ip和端口(对应的是Apache的地址),然后Web Server会通过Host字段来路由到具体的服务。如果Apache看到的是Ip+端口,则不知道该把请求路由到哪个服务。

当前只能通过发送请求之时给uri填写完整的路径来解决,比如http://www.foo.com/get_result?a=b,但是感觉还是应该框架补全能力比较好。
因为比如我有多个环境,有测试、生产环境。他们访问下游HTTP服务的时候,每个环境的服务请求的接口地址是不同的。
比如生产环境是:http://www.foo.com/get_result?a=b,而测试环境是http://www.foo-test.com/get_result?a=b,我的服务在给Channel初始化的时候已经依据环境来用不同的域名地址进行了初始化。但是当实际给下游发请求的时候(CallMethod之前构造http_request的时候),还需要再判断一遍系统运行环境,再给cntl->http_request().uri() 视环境来赋不同的值。有些冗余,我感觉还是应该框架做好,减少冗余配置和冗余判断。

当然要实现这一feature,可实现的方案有很多。我这里采用的方案是也是当前感觉改动最小,影响最小但能达到目的的一个。受限于Brpc的历史设计,导致从URL解析出来的域名(或机器名,hostname)没有被存储。
因此我这里重新做了ParseUrl和hostname2ip的操作。虽然Init过程中其实之前就有间接调用到这两个函数,但是确实无法被Channel拿到结果。

@guodongxiaren
Copy link
Member Author

@zyearn

@guodongxiaren
Copy link
Member Author

@jamesge @zyearn 辛苦看看

@zyearn
Copy link
Member

zyearn commented Aug 19, 2021

嗯感谢反馈,这两天我们看一下

@guodongxiaren
Copy link
Member Author

嗯感谢反馈,这两天我们看一下

嗯嗯,Apache有个VirtualHost的功能,就是让一个端口,根据域名路由到不同服务

@guodongxiaren
Copy link
Member Author

@zyearn 请问看了吗

_hostname.swap(host);
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 这段可以直接写在InitSingle里吗?这样就不用重复写在两个Init里了
  2. 另外这段可以封装成函数,因为在single和集群同时用到了
  3. 感觉可以不用判断协议,对所有协议而言把这个hostname存下来没什么坏处,以后拓展协议用到的话,也更容易。

if (_options.protocol == brpc::PROTOCOL_HTTP) {
const URI& uri = cntl->http_request().uri();
if (uri.host().empty() && !_hostname.empty()) {
const_cast<URI&>(uri).set_host(_hostname);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里直接拿到mutable uri就不需要const转化了吧?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

哦。对。我搞错了,以为拿到的是const的,其实就是mutable的。我改下

@guodongxiaren
Copy link
Member Author

@zyearn 修改了,但是CI的UT测试不稳定。在我没涉及的地方出问题,而且每次不一样。。

@@ -332,6 +333,7 @@ int Channel::Init(const char* ns_url,
NULL, &_options.mutable_ssl_options()->sni_name, NULL);
}
}
ParseHostname(ns_url);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果这里返回码不重要的话,原函数需要返回int吗

src/brpc/channel.cpp Outdated Show resolved Hide resolved
@guodongxiaren
Copy link
Member Author

guodongxiaren commented Aug 28, 2021

@zyearn 已经根据反馈修改了代码。有几个地方和你确认一下。

  1. Init用命名服务初始化的重载函数中,我还是判断了HTTP协议,因为如果是file:// 或者consul:// ,那么参数中的ns_url进行ParseHostname无意义。在InitSingle中没有判断协议了。
  2. 当前的实现如果初始化的URL中是机器IP,调用ParseHostname(间接调用hostname2ip)也会解析成功,最后把ip存到 _hostname中,不知道是否需要额外规避掉这种情况,让 _hostname只存储主机名或者网络域名?

@zyearn
Copy link
Member

zyearn commented Aug 28, 2021

@zyearn 已经根据反馈修改了代码。有几个地方和你确认一下。

  1. Init用命名服务初始化的重载函数中,我还是判断了HTTP协议,因为如果是file:// 或者consul:// ,那么参数中的ns_url进行ParseHostname无意义。在InitSingle中没有判断协议了。
  2. 当前的实现如果初始化的URL中是机器IP,调用ParseHostname(间接调用hostname2ip)也会解析成功,最后把ip存到 _hostname中,不知道是否需要额外规避掉这种情况,让 _hostname只存储主机名或者网络域名?
  1. 这样的话两处代码就不统一了。另外_hostname这个名字在channel中改成_service_name是不是更通用一些(https://github.com/apache/incubator-brpc/blob/master/src/brpc/naming_service.h#L53 )?然后在HTTP协议中把_service_name作为hostname,在别的协议中可能担任另外的作用(也可以不用,例如你说的file/consul),这样似乎就不用判断协议了。
  2. 如果按照1改成_service_name的话,就不存在这个问题了,因为这里预期存的是用户的输入。域名就存域名,ip的话就存ip

What do you think?

@guodongxiaren
Copy link
Member Author

@zyearn 改成_service_name了,也取消了协议的判断。另外给 src/brpc/details/http_message.cpp 中从uri.host()给Header补Host字段的逻辑,加上了补port。

因为我这个逻辑走到这里,uri.port()还是 -1 不会被 追加到Host字段中(Host只有域名)。当然Host字段中的port在Http协议中也不是必须的,不过我PR里之前修改的文档描述中写了,对方收到的Host会包含端口。所以我这里和我的文档做一下对齐。

@@ -332,6 +333,7 @@ int Channel::Init(const char* ns_url,
NULL, &_options.mutable_ssl_options()->sni_name, NULL);
}
}
ParseServiceName(ns_url);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里能拿到 service_name 吗?拿到的会是什么样?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

拿到的就是 file://list://, consul:// 这些前缀后面的部分,我加了一个 用 file:// 初始化channel的UT ,你可以看看。
ParseHostname就是调用了一下ParseURL,提取了 xxx:// 之后,端口号: 或 路径/ 之前的部分,作为service_name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一整段可以合并一下,少一次parse:

    Parseurl...
    if (_options.protocol == brpc::PROTOCOL_HTTP &&
        ::strncmp(ns_url, "https://", 8) == 0) {
        if (_options.mutable_ssl_options()->sni_name.empty()) {
            _options.mutable_ssl_options()->sni_name = _service_name;
        }
    }

@zyearn
Copy link
Member

zyearn commented Aug 28, 2021

因为我这个逻辑走到这里,uri.port()还是 -1 不会被 追加到Host字段中(Host只有域名)。当然Host字段中的port在Http协议中也不是必须的,不过我PR里之前修改的文档描述中写了,对方收到的Host会包含端口。所以我这里和我的文档做一下对齐。

Hi,这个改动打破backward compatibility了,不知道会有什么影响,另外文档里也说“若用户没有填且URL中包含host,比如http://www.foo.com/path,则http request中会包含"Host: www.foo.com",我们最好可以不改这个。

@@ -332,6 +333,7 @@ int Channel::Init(const char* ns_url,
NULL, &_options.mutable_ssl_options()->sni_name, NULL);
}
}
ParseServiceName(ns_url);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一整段可以合并一下,少一次parse:

    Parseurl...
    if (_options.protocol == brpc::PROTOCOL_HTTP &&
        ::strncmp(ns_url, "https://", 8) == 0) {
        if (_options.mutable_ssl_options()->sni_name.empty()) {
            _options.mutable_ssl_options()->sni_name = _service_name;
        }
    }

void Channel::ParseServiceName(const char* server_addr) {
if (ParseURL(server_addr, NULL, &_service_name, NULL) != 0) {
_service_name.clear();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当时是觉得函数太长,如果目前这个函数这么短了,感觉就不需要封装ParseServiceName了,在调用的地方直接ParseURL(...)即可,如果失败的话_service_name都不会被设置(参考ParseURL的实现),所以_service_name.clear()也不需要了。

@@ -295,6 +295,7 @@ int Channel::InitSingle(const butil::EndPoint& server_addr_and_port,
NULL, &_options.mutable_ssl_options()->sni_name, NULL);
}
}
ParseServiceName(raw_server_address);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

和下面的comment一样,移到前面去可以合并一下

@@ -567,6 +567,8 @@ void MakeRawHttpRequest(butil::IOBuf* request,
os << uri.host();
if (uri.port() >= 0) {
os << ':' << uri.port();
} else if (remote_side.port != 0) {
os << ':' << remote_side.port;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As comments said, we can't afford to break the backward compatibility.

@guodongxiaren
Copy link
Member Author

guodongxiaren commented Aug 28, 2021

@zyearn 哦,那个if 我忘了合并,明天我改完,你再merge吧。

@zyearn
Copy link
Member

zyearn commented Aug 28, 2021

@zyearn 哦,那个if 我忘了合并,明天我改完,你再merge吧。

Okay

@guodongxiaren
Copy link
Member Author

@zyearn 再看下

@guodongxiaren
Copy link
Member Author

@zyearn 有空看下,CI也过了

@@ -114,7 +114,9 @@ URL的一般形式如下图:

若用户没有填且URL中包含host,比如http://www.foo.com/path,则http request中会包含"Host: www.foo.com"。

若用户没有填且URL不包含host,比如"/index.html?name=value",则框架会以目标server的ip和port为Host,地址为10.46.188.39:8989的http server将会看到"Host: 10.46.188.39:8989"。
若用户没有填且URL不包含host,比如"/index.html?name=value",但如果Channel初始化的地址包含域名,则框架会以域名作为Host,比如"http://www.foo.com",该http server将会看到"Host: www.foo.com"。如果地址是"http://www.foo.com:8989",则该http server将会看到"Host: www.foo.com"。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最后这个Host是不是应该是"Host: www.foo.com:8989"?

Copy link
Member Author

@guodongxiaren guodongxiaren Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不是。在http_message.cpp中:

    if (h->GetHeader("host") == NULL) {
        os << "Host: ";
        if (!uri.host().empty()) {
            os << uri.host();
            if (uri.port() >= 0) {
                os << ':' << uri.port();
            }
        } else if (remote_side.port != 0) {
            os << remote_side;
        }
        os << BRPC_CRLF;
    }

URL不包含host,但是channel初始化的时候包含的情况,我给uri的host字段赋了值。会走到:if (!uri.host().empty())的if 逻辑中。但是下面的uri.port() 是-1,所以不会把:port 追加到host中。
我前面有个commit是改了http_message.cpp,改成:

    if (h->GetHeader("host") == NULL) {
        os << "Host: ";
        if (!uri.host().empty()) {
            os << uri.host();
            if (uri.port() >= 0) {
                os << ':' << uri.port();
            } else if (remote_side.port != 0) {  // 加了这个逻辑
                os << ':' << remote_side.port;
            }
        } else if (remote_side.port != 0) {
            os << remote_side;
        }
        os << BRPC_CRLF;
    }

从remote_side中拿port拼接到Host字段。不过你说可能有兼容性问题,所以我又去掉了。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, 我之前说的谦容性问题是,实现将Host: www.aa.com 变成了 Host: www.aa.com:80。现在这样实现的话 www.aa.com和www.aa.com:8899对应到请求全部都是Host: www.aa.com,不太符合预期 https://datatracker.ietf.org/doc/html/rfc2616#section-14.23

符合谦容性的实现是:

要想个办法把ParseUrl出来的port,也在set_host的时候顺便set_port,这样http_message.cpp就不用改了。简单看了下,似乎把parse出来的port追加到_service_name后面,然后替换set_host为uri.SetHostAndPort(_service_name)就可以了。看看可行么?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有一点问题,如果调用的是 channel.Init("www.aa.com", 8899),那么对方收到的Host还是不包含端口号的。因为InitSingle中解析的原始地址,是Init的第一个参数。
感觉不存在十全十美的解决方案,我感觉还是要在CallMethod的时候,当uri不包含host的时候,把host和port都拼上。不影响uri中包含host时的处理逻辑。

另外一个方案就是在两个Init重载中,记录一下(新增bool成员变量存储标记,或者直接存原始port),InitSingle中判断是否单独设置了port参数。

If user does not set host header and the URL does not have host as well, for example "/index.html?name=value", framework sets `Host` header with IP and port of the target server. A http server at 10.46.188.39:8989 should see `Host: 10.46.188.39:8989`.
If user does not set host header and the URL does not have host as well, for example "/index.html?name=value", but if the address initialized by the channel contains domain name. framework sets `Host` header with domain name of the target server. if this address is "http://www.foo.com", this http server should see `Host: www.foo.com`. if this address is "http://www.foo.com:8989", this http server should be see `Host: www.foo.com`.

If user does not set host header and the URL does not have host as well, for example "/index.html?name=value", and the address initialized by the channel doesn't contain domain name. framework sets `Host` header with IP and port of the target server. A http server at 10.46.188.39:8989 should see `Host: 10.46.188.39:8989`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. There is an extra space before 'for example'
  2. it should be ',' instead of '.' after 'if...'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改了

@guodongxiaren
Copy link
Member Author

@zyearn 你的评论我回复了。看下

Copy link
Member Author

@guodongxiaren guodongxiaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zyearn 我改了一下InitSingle的函数参数,多了一个raw_port参数(默认值-1),让InitSingle可以感知是从哪个Init重载调用进来的。

如果raw_port不等于-1,那么就是用它拼接到hostname后面作为 _service_name。
如果raw_port为-1,就去解析 raw_server_address(其实Init的第一个参数)字符串,看能不能解析到port。能的话就追加到hostname后面作为 _service_name,否则,直接用 hostname作为 _service_name。

CallMethod中调用SetHostAndPort

@@ -256,7 +256,7 @@ int Channel::Init(const char* server_addr, int port,
return -1;
}
}
return InitSingle(point, server_addr, options);
return InitSingle(point, server_addr, options, port);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

channel.Init("www.aa.com", 8899,...) 这个Init函数会调用这个带port参数的InitSingle

if (_options.protocol == brpc::PROTOCOL_HTTP &&
::strncmp(raw_server_address, "https://", 8) == 0) {
std::string scheme;
int* port_out = raw_port == -1 ? &raw_port: NULL;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果raw_port 是默认值-1,那么就取它的地址,否则取NULL(表示raw_port已经有值了,不需要再从raw_server_address中解析)

int* port_out = raw_port == -1 ? &raw_port: NULL;
ParseURL(raw_server_address, &scheme, &_service_name, port_out);
if (raw_port != -1) {
_service_name.append(":").append(std::to_string(raw_port));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hostname:port 组合成 _service_name

::strncmp(ns_url, "https://", 8) == 0) {
std::string scheme;
int raw_port = -1;
ParseURL(ns_url, &scheme, &_service_name, &raw_port);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果是name_service的Init,那么直接取尝试从ns_url中解析port

if (_options.protocol == brpc::PROTOCOL_HTTP) {
URI& uri = cntl->http_request().uri();
if (uri.host().empty() && !_service_name.empty()) {
uri.SetHostAndPort(_service_name);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一次性给uri的Host,port赋值

@@ -213,8 +213,10 @@ friend class SelectiveChannel;
int InitChannelOptions(const ChannelOptions* options);
int InitSingle(const butil::EndPoint& server_addr_and_port,
const char* raw_server_address,
const ChannelOptions* options);
const ChannelOptions* options,
int raw_port = -1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InitSingle增加了一个带默认值的参数,raw_port

@zyearn
Copy link
Member

zyearn commented Sep 4, 2021

LGTM

@zyearn zyearn merged commit bc343db into apache:master Sep 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants