Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling of Accept-Encoding / Content-Encoding decompression (fixes #562) #729

Merged
merged 2 commits into from
Jun 29, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,3 +104,68 @@ class Client
end
end
```

### HTTP Compression

The `Accept-Encoding` request header and `Content-Encoding` response header
are used to control compression (gzip, etc.) over the wire. Refer to
[RFC-2616](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html) for details.
(For clarity: these headers are **not** used for character encoding i.e. `utf-8`
which is specified in the `Accept` and `Content-Type` headers.)

Unless you have specific requirements otherwise, we recommend to **not** set
set the `Accept-Encoding` header on HTTParty requests. In this case, `Net::HTTP`
will set a sensible default compression scheme and automatically decompress the response.

If you explicitly set `Accept-Encoding`, there be dragons:

* If the HTTP response `Content-Encoding` received on the wire is `gzip` or `deflate`,
`Net::HTTP` will automatically decompress it, and will omit `Content-Encoding`
from your `HTTParty::Response` headers.

* For encodings `br` (Brotli) or `compress` (LZW), HTTParty will automatically
decompress if you include the `brotli` or `ruby-lzws` gems respectively into your project.
**Warning:** Support for these encodings is experimental and not fully battle-tested.
Similar to above, if decompression succeeds, `Content-Encoding` will be omitted
from your `HTTParty::Response` headers.

* For other encodings, `HTTParty::Response#body` will return the raw uncompressed byte string,
and you'll need to inspect the `Content-Encoding` response header and decompress it yourself.
In this case, `HTTParty::Response#parsed_response` will be `nil`.

* Lastly, you may use the `skip_decompression` option to disable all automatic decompression
and always get `HTTParty::Response#body` in its raw form along with the `Content-Encoding` header.

```ruby
# Accept-Encoding=gzip,deflate can be safely assumed to be auto-decompressed

res = HTTParty.get('https://example.com/test.json', headers: { 'Accept-Encoding' => 'gzip,deflate,identity' })
JSON.parse(res.body) # safe


# Accept-Encoding=br,compress requires third-party gems

require 'brotli'
require 'lzws'
res = HTTParty.get('https://example.com/test.json', headers: { 'Accept-Encoding' => 'br,compress' })
JSON.parse(res.body)


# Accept-Encoding=* may return unhandled Content-Encoding

res = HTTParty.get('https://example.com/test.json', headers: { 'Accept-Encoding' => '*' })
encoding = res.headers['Content-Encoding']
if encoding
JSON.parse(your_decompression_handling(res.body, encoding))
else
# Content-Encoding not present implies decompressed
JSON.parse(res.body)
end


# Gimme the raw data!

res = HTTParty.get('https://example.com/test.json', skip_decompression: true)
encoding = res.headers['Content-Encoding']
JSON.parse(your_decompression_handling(res.body, encoding))
```
17 changes: 17 additions & 0 deletions lib/httparty.rb
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
require 'httparty/logger/logger'
require 'httparty/request/body'
require 'httparty/response_fragment'
require 'httparty/decompressor'
require 'httparty/text_encoder'
require 'httparty/headers_processor'

Expand Down Expand Up @@ -401,6 +402,22 @@ def ssl_version(version)
default_options[:ssl_version] = version
end

# Deactivate automatic decompression of the response body.
# This will require you to explicitly handle body decompression
# by inspecting the Content-Encoding response header.
#
# Refer to docs/README.md "HTTP Compression" section for
# further details.
#
# @example
# class Foo
# include HTTParty
# skip_decompression
# end
def skip_decompression(value = true)
default_options[:skip_decompression] = !!value
end

# Allows setting of SSL ciphers to use. This only works in Ruby 1.9+.
# You can get a list of valid specific ciphers from OpenSSL::Cipher.ciphers.
# You also can specify a cipher suite here, listed here at openssl.org:
Expand Down
92 changes: 92 additions & 0 deletions lib/httparty/decompressor.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# frozen_string_literal: true

module HTTParty
# Decompresses the response body based on the Content-Encoding header.
#
# Net::HTTP automatically decompresses Content-Encoding values "gzip" and "deflate".
# This class will handle "br" (Brotli) and "compress" (LZW) if the requisite
# gems are installed. Otherwise, it returns nil if the body data cannot be
# decompressed.
#
# @abstract Read the HTTP Compression section for more information.
class Decompressor

# "gzip" and "deflate" are handled by Net::HTTP
# hence they do not need to be handled by HTTParty
SupportedEncodings = {
'none' => :none,
'identity' => :none,
'br' => :brotli,
'compress' => :lzw
}.freeze

# The response body of the request
# @return [String]
attr_reader :body

# The Content-Encoding algorithm used to encode the body
# @return [Symbol] e.g. :gzip
attr_reader :encoding

# @param [String] body - the response body of the request
# @param [Symbol] encoding - the Content-Encoding algorithm used to encode the body
def initialize(body, encoding)
@body = body
@encoding = encoding
end

# Perform decompression on the response body
# @return [String] the decompressed body
# @return [nil] when the response body is nil or cannot decompressed
def decompress
return nil if body.nil?
return body if encoding.nil? || encoding.strip.empty?

if supports_encoding?
decompress_supported_encoding
else
nil
end
end

protected

def supports_encoding?
SupportedEncodings.keys.include?(encoding)
end

def decompress_supported_encoding
method = SupportedEncodings[encoding]
if respond_to?(method, true)
send(method)
else
raise NotImplementedError, "#{self.class.name} has not implemented a decompression method for #{encoding.inspect} encoding."
end
end

def none
body
end

def brotli
return nil unless defined?(::Brotli)
begin
::Brotli.inflate(body)
rescue StandardError
nil
end
end

def lzw
begin
if defined?(::LZWS::String)
::LZWS::String.decompress(body)
elsif defined?(::LZW::Simple)
::LZW::Simple.new.decompress(body)
end
rescue StandardError
nil
end
end
end
end
8 changes: 5 additions & 3 deletions lib/httparty/parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -144,9 +144,11 @@ def supports_format?
end

def parse_supported_format
send(format)
rescue NoMethodError => e
raise NotImplementedError, "#{self.class.name} has not implemented a parsing method for the #{format.inspect} format.", e.backtrace
if respond_to?(format, true)
send(format)
else
raise NotImplementedError, "#{self.class.name} has not implemented a parsing method for the #{format.inspect} format."
end
end
end
end
29 changes: 25 additions & 4 deletions lib/httparty/request.rb
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,8 @@ def setup_raw_request
@raw_request.body = body.call
end

@raw_request.instance_variable_set(:@decode_content, decompress_content?)

if options[:basic_auth] && send_authorization_header?
@raw_request.basic_auth(username, password)
@credentials_sent = true
Expand All @@ -240,6 +242,10 @@ def digest_auth?
!!options[:digest_auth]
end

def decompress_content?
!options[:skip_decompression]
end

def response_unauthorized?
!!last_response && last_response.code == '401'
end
Expand Down Expand Up @@ -271,7 +277,7 @@ def assume_utf16_is_big_endian
options[:assume_utf16_is_big_endian]
end

def handle_response(body, &block)
def handle_response(raw_body, &block)
if response_redirects?
options[:limit] -= 1
if options[:logger]
Expand All @@ -292,9 +298,20 @@ def handle_response(body, &block)
capture_cookies(last_response)
perform(&block)
else
body ||= last_response.body
body = body.nil? ? body : encode_text(body, last_response['content-type'])
Response.new(self, last_response, lambda { parse_response(body) }, body: body)
raw_body ||= last_response.body

body = decompress(raw_body, last_response['content-encoding']) unless raw_body.nil?

unless body.nil?
body = encode_text(body, last_response['content-type'])

if decompress_content?
last_response.delete('content-encoding')
raw_body = body
end
end

Response.new(self, last_response, lambda { parse_response(body) }, body: raw_body)
end
end

Expand Down Expand Up @@ -370,6 +387,10 @@ def set_basic_auth_from_uri
end
end

def decompress(body, encoding)
Decompressor.new(body, encoding).decompress
end

def encode_text(text, content_type)
TextEncoder.new(
text,
Expand Down
Loading