Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't cache and don't forward hop-by-hop headers #650

Merged
merged 24 commits into from
Dec 29, 2016
Merged

Conversation

vankoven
Copy link
Contributor

Fix issue #409

  • Parse Connection header during parsing HTTP message;
  • Store raw headers listed in Connection header in temporary array allocated in message's pool: every raw header is marked as hop-by-hop if it's name present in that array. When connection header parsed in the first time, parser checks already parsed raw headers and mark them as hop-by-hop headers if needed. If Connection header does not contain any raw headers names no extra allocations or string comparison happen;
  • Special headers have their own marking procedure which does not involve string comparisons. Some of the special headers are always hop-by-hop (Connection, we also mark Server in responses as hop-by-hop) and using generic comparison like for Raw headers will ruin performance;
  • Unit tests for responses and requests with hop-by-hop marking checks;
  • little optimisation of __hdr_lookup(): reduce string comparisons.

TBD:
When response contains Connection: keep-alive, Keep-alive header will be marked as hop-by-hop header and will not be saved in cache (that was the requirement in #409 ). RFC 7230 6.1 also say:

 A proxy or gateway MUST parse a received Connection header field before
   a message is forwarded and, for each connection-option in this field,
   remove any header field(s) from the message with the same name as the
   connection-option, and then remove the Connection header field itself
   (or replace it with the intermediary's own connection options for the
   forwarded message).

Currently we have no realised way to set default Keep-Alive header (not found in issues but here is corresponding TODO: https://github.com/tempesta-tech/tempesta/blob/master/tempesta_fw/http.c#L639 ). So question is: delete Keep-Alive header in forwarded messages or not? But messages served from cache will have no such header. Currently i remove it from both forwarded messages: response and request, disabling that is one-line patch.

msg->parser.hdr.flags |= TFW_STR_HBH_HDR;

/**
* Mark raaw header @hdr as hop-by-hop if its name was listed in Connection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raaw -> raw

if (!ht)
return;

for (i = 0; i < ht->off; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does RFC allow setting headers, mentioned in Connection, before Connection header? The code is dangerous, just consider the example:

    GET / HTTP/1.1\r\n
    Connection: keep-alive, Host, User-Agent\r\n
    Host: foo.com\r\n
    User-Agent: malicious client\r\n
    \r\n

So server will receive malformed request.

We should run the loop at least from TFW_HTTP_HDR_RAW. Probably there is some limitation which headers can be mentioned in Connection header... The other way (I think the safest one) is to remove Keep-Alive only and ignore any other mentioned headers in Connection. However, we still use current mechanism for the headers removal such that we'll be able to remove other headers which are known to be mentioned in Connection. I argue against blind headers removal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. RFC does not have any restrictions where headers listed in Connection header must be placed. So both variants are ok: before or after Connection header. More over i tried to set up custom hop-by-hop headers in ngingx and apache, both allow setting custom headers before and after the header. Here example from RFC 6455 with hop-by-hop header before Connection header:
GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Origin: http://example.com
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13
  1. Table you point to stores names for only raw headers. I agree, it could be confusing, but i used data structure which suit me perfectly. In situation you show user-agent and every other spec header listed in connection header (except keep-alive) will not be removed

Obsoleted now, in new revision this array is statically allocated

TfwStr *hdr, *append;
int r;
TfwHttpHdrTbl *ht = hm->parser.hbh_parser.raw;
if (!ht) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't mentioned in our Coding Style, but FreeBSD KNF requires split variables definitions and code. It's bit hard to read when code immediately follows variable definitions immediately.

}

/**
* Add header name listed in Connection header to rable of raw headers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rable -> table

if (hid == ht->size)
if (__tfw_http_msg_grow_hdr_tbl(&hm->parser.hbh_parser.raw,
hm->pool))
return -ENOMEM;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true and powerful headers parser. Do we really allow that Connection mentioning more than 16 header? I believe it has sense to block the header. Meantime it will make the code faster and easier and that's good.

@@ -917,8 +1069,24 @@ __parse_connection(TfwHttpMsg *hm, unsigned char *data, size_t len)
* it could be names of any headers, including custom headers.
*/
__FSM_STATE(I_ConnOther) {
__FSM_I_MATCH_MOVE(token, I_ConnOther);
/* Unwind __FSM_I_MATCH_MOVE(token, I_ConnOther): */
__fsm_n = __data_remain(p);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is MATCH_MOVE macroses which embody the code.

case 'k':
if (likely(__data_available(p, 11)
&& C4_INT_LCM(p + 1, 'e', 'e', 'p', '-')
&& C4_INT_LCM(p + 5, 'a', 'l', 'i', 'v')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The discussed bug to fix

@@ -877,6 +877,219 @@ TEST(http_parser, cookie)
"\r\n");
}

TEST(http_parser, req_hop_by_hop)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test case described above for "Connection: Host\r\n' to mangled message.

@@ -128,7 +128,7 @@ module_exit(ttls_exit);

MODULE_AUTHOR("Tempesta Technologies");
MODULE_VERSION("2.3.0");
MODULE_LICENSE("GPLv2");
MODULE_LICENSE("GPL v2");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it taints kernel? The change is fine if not.

@krizhanovsky
Copy link
Contributor

Please address the comment #409 (comment) or move the requirement to #634

@vankoven
Copy link
Contributor Author

vankoven commented Dec 4, 2016

Changes since last review:

  • rebased to current master
  • fixed error in current master with Transfer-Encoding header: the header now is SPEC header but it missed in __http_msg_hdr_val()
  • Connection header is limited to have less than 16 connection tokens for raw headers. Otherwise request or response will be filtered
  • Drop message if some of connection tokens is end-to-end headers used by tempesta and parsed by TFW_HTTP_PARSE_RAWHDR_VAL macro. Comparing of connection tokens with all headers defined in RFC 7230, 7231 as end-to-end will lead to a lot of string comparisons inside of parsing of Connection header
  • Unit tests updated to have more test cases for hop-by-hop headers marking

The feature have no special requirements for #634 . (Headers listed there are not hop-by-hop)

I tried to compare behaviour with nginx, but it support hop-by-hop headers in terms of obsoleted RFC 2616, not 7230

} else {
--ce->hdr_num;
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that n is not used as an array index, perhaps it's best to postpone its calculation until the else if case.

if () {
} else if (field - resp->h_tbl->tbl < TFW_HTTP_HDR_RAW) {
} else {
}

else if (n < TFW_HTTP_HDR_RAW)
h = &empty;
else
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the comment above regarding the use of n variable. In this case, n is not used anywhere else.

* Http headers table.
*
* Singular headers (in terms of RFC 7230 3.2.2) go first to protect header
* repetition attacks. See __hdr_is_singular() and don't forget to
* update the static headers array when add a new singular header here.
* If new header can be hop-by-hop header update of __parse_connection() is
* also needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole comment should be reworked to clearly state which arrays and where must be updated when adding a certain type of an HTTP header field.

* group.
*
* @spec - bit array for special headers. Hop-by-hop special header is
* stored as (0x1 << tfw_http_hdr_t[hid]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a corresponding proper BUILD_BUG_ON() is justified to ensure that the type used for the structure member can accommodate the necessary number of headers (has the required number of bits).

@@ -262,6 +288,7 @@ typedef struct {
* @content_length - the value of Content-Length header field;
* @conn - connection which the message was received on;
* @jtstamp - time the message has been received, in jiffies;
* @keep_alive - keep-alive timeout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout specified in the Keep-Alive header field? Or something else?

memmove(&hbh->raw[i], &hbh->raw[i + 1],
sizeof(TfwStr) * (TFW_HTTP_HDR_NUM - i -1));
TFW_STR_INIT(&hbh->raw[TFW_HTTP_HDR_NUM - 1]);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to exclude this header from further comparisons, then why not use the same TFW_STR_HBH_HDR flag on headers in hbh->raw[] and skip those headers in the comparison loop? That is easier and faster than doing a memmove().

@@ -619,6 +739,7 @@ __FSM_STATE(st_curr) { \
/* The header value is fully parsed, move forward. */ \
if (saveval) \
__msg_hdr_chunk_fixup(p, __fsm_n); \
MARK_SPEC_HBH(id); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be made an inline function to complement mark_raw_hbh().

hdr = &hbh->raw[i];
break;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that there should be the current index into the array - analogous to ht->off. You don't have to search for the current entry.


h = &ht->tbl[hid];
TFW_STR_FOR_EACH_DUP(d, h, end)
d->flags |= TFW_STR_HBH_HDR;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to mark each duplicate as HBH? I don't think so, but maybe there's a case to prove otherwise?

* name only is enough in @hdr.
*/
void
tfw_http_msg_mark_hbh_hdr(TfwHttpMsg *hm, TfwStr *hdr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this function (under a better name) belongs in http_parser.c.
If I understand correctly, the reason for placing it here was that __hdr_lookup() was located here, and it's a local (static) function to this sub-module. In my opinion though it should be the other way around: __hdr_lookup() should be made public (and given a proper name for that) and used from http_parser.c

Copy link
Contributor

@keshonok keshonok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Only minor changes requested.

@@ -2002,7 +2080,7 @@ tfw_http_parse_req(void *req_data, unsigned char *data, size_t len)
TfwHttpReq *req = (TfwHttpReq *)req_data;
__FSM_DECLARE_VARS(req);

__hbh_parser_init(&parser->hbh_parser, true);
__hbh_parser_init_req(req);
Copy link
Contributor

@keshonok keshonok Dec 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called once per a message. However here it's called each time the parsing function is called. If a message is transferred one byte at a time, then this will be executed for each byte. That's unnecessary. I would suggest to call it in Req_0 state, and make another state Req_0_CRLF for skipping empty lines at the start of a message. What do you think?

@@ -3452,7 +3530,7 @@ tfw_http_parse_resp(void *resp_data, unsigned char *data, size_t len)
TfwHttpResp *resp = (TfwHttpResp *)resp_data;
__FSM_DECLARE_VARS(resp);

__hbh_parser_init(&parser->hbh_parser, false);
__hbh_parser_init_resp(resp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the comment above.

unsigned int id;

for (id = 0; id < TFW_HTTP_HDR_RAW; ++id) {
hdr = &ht->tbl[id];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TfwStr *hdr = &ht->tbl[id];

Or perhaps even

TfwStr *hdr = &hm->h_tbl->tbl[id];

@@ -495,29 +516,52 @@ static void
mark_raw_hbh(TfwHttpMsg *hm, TfwStr *hdr)
{
TfwHttpHbhHdrs *hbh = &hm->parser.hbh_parser;
TfwStr * hbh_name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space.

TfwStr *hbh_name;

* corresponding hop-by-hop header was found.
*/
for (i = 0; i < hbh->off; ++i) {
hbh_name = &hbh->raw[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TfwStr *hbh_name = &hbh->raw[i];

for (i = 0; i < hbh->off; ++i) {
hbh_name = &hbh->raw[i];
if ((hbh_name->flags & TFW_STR_HBH_HDR)
&& !(tfw_stricmpspn(&hbh->raw[i], hdr, ':'))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The body of this if statement is better visible in this case when the opening bracket is on a separate line.

/* Add spec header of id @hid and @name to list of spec hop-by-hop headers */
#define TRY_HBH_SPEC_HDR_LAMBDA(name, hid, lambda) \
TRY_HBH_TOKEN(name, { \
BUG_ON(hid > sizeof(parser->hbh_parser.spec) * CHAR_BIT);\
Copy link
Contributor

@keshonok keshonok Dec 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an one-time operation per whole Tempesta run. Please take a look at BUILD_BUG_ON() macro which is a compile-time check.

{
TfwHttpHbhHdrs *hbh_hdrs = &req->parser.hbh_parser;
/* Connection is hop-by-hop header by RFC 7230 6.1 */
hbh_hdrs->spec |= 0x1 << TFW_HTTP_HDR_CONNECTION;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this should be performed only once per message, shouldn't this be a direct assignment, not an OR operation?

*/
hbh_hdrs->spec |= 0x1 << TFW_HTTP_HDR_CONNECTION;
if (!req)
hbh_hdrs->spec |= 0x1 << TFW_HTTP_HDR_SERVER;
hbh_hdrs->spec |= 0x1 << TFW_HTTP_HDR_SERVER;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the comment above regarding direct assignment vs OR operation.

@@ -1348,6 +1348,7 @@ __parse_transfer_encoding(TfwHttpMsg *hm, unsigned char *data, size_t len)
/* Main (parent) HTTP request processing states. */
enum {
Req_0,
Req_CRLF,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, this may appear too generic. Something like Req_0_CRLF may visibly clarify that this is one of the initial states.

__FSM_STATE(Req_0) {
__hbh_parser_init_req(req);
/* fall through */
Copy link
Contributor

@keshonok keshonok Dec 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that it's not correct to move from one parser state to another by just falling through. Generally speaking, let's suppose there are no more bytes to process. The parser still believes it's in the state where it came from, while it should be in the new state already, and it should return to the new state when more bytes are available.

All moves from one FSM state to another must be done via a macro that moves the parser to the new state. Please check a few other occurrences of this fall through use.

Copy link
Contributor Author

@vankoven vankoven Dec 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked out all the places in code, where fall through used instead of FSM_MOVE directives. No errors there and fall through can be replaced with __FSM_JMP. Since desired state is the next one fall through is used in order not to make jump. I have no idea if jump could be optimized out, so fall through is the best option here.

__FSM_STATE saves current parser state to __fsm_const_state: https://github.com/tempesta-tech/tempesta/blob/master/tempesta_fw/http_parser.c#L120. In all the places in parser fall through is used only when we do not move our current position in message and we want to work with the same data in next stage or stages.

Any flavour of FSM_MOVE* is not an option here since parser->to_go will be set to 0 for situation you are describing ( https://github.com/tempesta-tech/tempesta/blob/master/tempesta_fw/http_parser.c#L137 ). And next stage will take next character, not the same one.

Copy link
Contributor

@keshonok keshonok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@@ -466,6 +470,186 @@ parse_int_hex(unsigned char *data, size_t len, unsigned long *acc)
return CSTR_POSTPONE;
}

/**
* Add spec header indexes to list of hop-by-hop headers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to issue a special fix now, but comments are written in English, so they should obey English rules: start from capital letter and finish with dot. Exception is same-line comments which are continuation of C operators, i.e.

     a += b; /* this is same-line comment */

* adjusting as well as saves cache storage.
*
* Headers unconditionaly treated as hop-by-hop must be listed in
* __hbh_parser_init() function and must be members of Special headers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the comment is outdated since there is no function with name __hbh_parser_init().

}

/**
* Mark existing spec headers of http message @hm as hop-by-hop if they was
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"they was" -> "they were"

static int
__hbh_parser_add_data(TfwHttpMsg *hm, char *data, unsigned long len, bool last)
{
TfwStr *hdr = NULL, *append;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to initialize hdr.

parser->hbh_parser.spec |= 0x1 << hid; \
if (!TFW_STR_EMPTY(&msg->h_tbl->tbl[hid])) \
msg->h_tbl->tbl[hid].flags |= TFW_STR_HBH_HDR; \
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macro is used in only one place, so the code would be more readable if you place the macro as inline code at the appropriate place.

msg->h_tbl->tbl[hid].flags |= TFW_STR_HBH_HDR; \
})

#define TRY_SPEC_HBH_HDR(name, hid) TRY_SPEC_HBH_HDR_LAMBDA(name, hid, {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macro is not used at all.

__FSM_STATE(Req_0) {
__hbh_parser_init_req(req);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you create a new parser state for the initialization routine? It seems you can call _hbh_parser_init*() functions in tfw_http_msg_alloc() where base parser is initialized.

__FSM_STATE(Resp_0) {
__hbh_parser_init_resp(resp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as for request


/* Connection header lists end-to-end spec headers */
FOR_REQ(REQ_HBH_START
"Connection: Host, Content-Length, Content-Type, Connection,"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the request must be blocked since there is special header in Connection

@vankoven vankoven merged commit 72a24ae into master Dec 29, 2016
@vankoven vankoven deleted the ik-hbh-parser branch December 29, 2016 05:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants