Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL 相关问题。 #1096

Open
feifeiiong opened this issue Apr 23, 2020 · 6 comments
Open

SSL 相关问题。 #1096

feifeiiong opened this issue Apr 23, 2020 · 6 comments
Labels
bug the code does not work as expected

Comments

@feifeiiong
Copy link

feifeiiong commented Apr 23, 2020

Describe the bug (描述bug)

io_buf中处理ssl相关的逻辑,未考虑SSL_write和SSL_read可能出现的所有情况,导致出现异常。
To Reproduce (复现方法)
建立ssl连接后,第三方ssl服务端因为某些原因关闭socket,客户端持续写入/读取,框架处理出现异常,出现异常的代码位置:
socket.cpp
1855行,DoRead阶段对ssl的错误码处理。
1699行,DoWrite阶段对ssl的错误码处理。
iobuf.cpp
1665行,append_from_SSL_channel 函数中的SSL_read:
const int rc = SSL_read(ssl, _block->data + _block->size, read_len);
*ssl_error = SSL_get_error(ssl, rc);
rc =0 时,可能收到了close_notify,此时应判断ssl_error,以处理可能的EOF,代码中未见。
此外,也可能出现SSL_ERROR_ZERO_RETURN,也未见处理。

970行:write相关逻辑:
cut_into_SSL_channel函数
const int nw = SSL_write(ssl, r.block->data + r.offset, r.length);
if (nw > 0) {
pop_front(nw);
}
*ssl_error = SSL_get_error(ssl, nw);

未见处理nw = 0,以及相应的ssl error,此处可能socket已关闭。

Expected behavior (期望行为)
正确处理SSL相关逻辑。
此外,还有一些疑问:

  1. 查看代码发现SSL相关逻辑混用BIO和SSL_read/write 接口,但这种写法似乎不推荐,大佬这么写的原因是?
  2. 上述未正确处理的错误,都是通过查看 man的ssl相关api得出,并且实际也遇到了这些问题,当时未处理这些错误的原因是什么?
  3. socket.cpp 未调用SSL_shutdown相关逻辑,不调用该函数的原因是?
  4. 对于一条已经过ssl认证的tcp,从池中再次拿出该socket,为什么需要再次发起握手流程?

Versions (各种版本)
OS:
Compiler:
brpc:
protobuf:

Additional context/screenshots (更多上下文/截图)

@feifeiiong
Copy link
Author

这个问题有什么进展吗,各位大佬?

@old-bear
Copy link
Contributor

old-bear commented Apr 24, 2020

  1. DoRead这个函数设计返回0即约定表示遇到EOF,这也和非SSL情况统一,同时外层input_messenger.cpp中也会相应处理,因此SSL_read return 0直接返回0就行。由于brpc中不允许半关闭读写的情况,对于对端EOF情况只需直接close本端就行。
  2. 另外brpc中一个连接要么一直是SSL,要么永远是非SSL,不会来回再SSL状态间切换,因此也不需要SSL_shutdown。
  3. cut_into_SSL_channel在nw=0时也会设置ssl_error,外层DoWrite也会捕获感知设置errno。
  4. 从池中(SocketPool)拿出的Socket,是不会再次ssl_handshake的。因为连接池模式保证一次请求结束前独占这个Socket,后续拿出时其ssl_state应该set过了。
  5. BIO的api调用,一部分是为了证书buffer,另一部分是为了能让用户设置SSL底层读写的buffer大小,具体读写没有用BIO_read/write

@feifeiiong
Copy link
Author

feifeiiong commented Apr 25, 2020

感谢您的回复!第四点确实是如您所说。关于1,3点,有一些不同的看法:
DoRead 的设计确实是读到0即代表约定遇到EOF,但是SSL_read为0时的处理并不完备,以下内容摘自man:
SSL_read return values:

'>' 0
The read operation was successful; the return value is the number of bytes actually read from the TLS/SSL connection.

0

The read operation was not successful. The reason may either be a clean shutdown due to a "close notify" alert sent by the peer (in which case the SSL_RECEIVED_SHUTDOWN flag in the ssl shutdown state is set (see ssl_shutdown(3), ssl_set_shutdown(3)). It is also possible, that the peer simply shut down the underlying transport and the shutdown is incomplete. Call SSL_get_error() with the return value ret to find out, whether an error occurred or the connection was shut down cleanly ( SSL_ERROR_ZERO_RETURN

SSL_get_error Return Values:
SSL_ERROR_ZERO_RETURN
The TLS/SSL connection has been closed.
SSL_ERROR_SYSCALL
Some I/O error occurred. The OpenSSL error queue may contain more information on the error. If the error queue is empty (i.e. ERR_get_error() returns 0), ret can be used to find out more about the error: If ret == 0, an EOF was observed that violates the protocol. If ret == -1, the underlying BIO reported an I/O error (for socket I/O on Unix systems, consult errno for details).

在iobuf中是这样处理的:


         const int rc = SSL_read(ssl, _block->data + _block->size, read_len);
        *ssl_error = SSL_get_error(ssl, rc);
        if (rc > 0) {
            const IOBuf::BlockRef r = { (uint32_t)_block->size, (uint32_t)rc, _block };
            _push_back_ref(r);
            _block->size += rc;
            if (_block->full()) {
                Block* const saved_next = _block->portal_next;
                _block->dec_ref();  // _block may be deleted
                _block = saved_next;
            }
            nr += rc;
        } else {
            if (rc < 0) {
                if (*ssl_error == SSL_ERROR_WANT_READ
                    || (*ssl_error == SSL_ERROR_SYSCALL
                        && BIO_fd_non_fatal_error(errno) == 1)) {
                    // Non fatal error, tell caller to read again
                    *ssl_error = SSL_ERROR_WANT_READ;
                } else {
                    // Other errors are fatal
                    return rc;
                }
            }
            return (nr > 0 ? nr : rc);

在rc == 0时,返回,并置错误码,此处的错误码,有可能是SSL_ERROR_SYSCALL/SSL_ERROR_ZERO_RETURN, 在socket.cpp:

    int ssl_error = 0;
    ssize_t nr = _read_buf.append_from_SSL_channel(_ssl_session, &ssl_error, size_hint);
    switch (ssl_error) {
    case SSL_ERROR_NONE:  // `nr' > 0
        break;
            
    case SSL_ERROR_WANT_READ:
        // Regard this error as EAGAIN
        errno = EAGAIN;
        break;
            
    case SSL_ERROR_WANT_WRITE:
        // Disable renegotiation
        errno = EPROTO;
        return -1;

    default: {
        const unsigned long e = ERR_get_error();
        if (nr == 0) {
            // Socket EOF or SSL session EOF
        } else if (e != 0) {
            LOG(WARNING) << "Fail to read from ssl_fd=" << fd()
                         << ": " << SSLError(e);
            errno = ESSL;
        } else {
            // System error with corresponding errno set
            PLOG(WARNING) << "Fail to read from ssl_fd=" << fd();
        }
        break;
    }

如上,以上错误码被default捕获,而此刻nr 有可能!=0 (iobuf.cpp : return (nr > 0 ? nr : rc);), 从而导致该EOF被忽略,导致socket无法关闭。在实际应用中,该问题已经频繁被触发,打印日志类似:
Fail to read from ssl_fd=id:Success
DoWrite有类似的问题,这里就不贴代码了,烦请您看看!

@feifeiiong
Copy link
Author

另外,关于main socket,健康检查时,如果设置了SSL options,会尝试进行ssl连接。想问下这个操作是否有必要?

@old-bear
Copy link
Contributor

感谢指正,我看了下 DoRead中确实存在bug,但具体问题好像与你描述的不同:

  1. append_from_SSL_channel中不断调用SSL_read,已经读入了一些内容,nr>0,此时最后一次SSL_read返回rc=0,而append_from_SSL_channel返回nr (>0)
  2. DoRead中进入default分支,打印了Fail to read from ssl_fd=id,返回nr (>0)
  3. InputMessenger中调用DoRead得到nr>0,处理消息,然后循环while (!read_eof),继续调用DoRead -> SSL_read,此时则rc=0,一路return出去,被当做eof

因此,我这边看这个bug主要在于会错误打印一句warning日志,应该不影响fd关闭或者fd泄漏
如果你这边有具体复现场景描述,或者相关日志,希望能提供一下,谢谢

另外,SSL握手目前被视作一种连通性指标,如果server拒绝与client建立SSL(无论何种因素),那么后续任何请求肯定无法成功。

@memory-overflow
Copy link

memory-overflow commented Dec 22, 2020

我也遇到一样的问题。

image

会一直打印类似的日志,然后服务器磁盘被这个日志写满了。也是因为这个bug引起的吗?

@wwbmmm wwbmmm added the bug the code does not work as expected label Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug the code does not work as expected
Projects
None yet
Development

No branches or pull requests

4 participants