bencode.hpp is a small, single-header C++ library for parsing and generating bencoded data. You might find it useful as an (extremely!) simple library for serializing data from your program.
This library has no external dependencies and only requires a C++20 compiler. It's been tested on Clang 10+, GCC 10+, and MSVC 2022+. The unit tests do depend on mettle, however.
Note: if Boost is installed, bencode.hpp will provide the ability to use
boost::variant
, which can perform significantly better than
std::variant
on some data sets (up to 2x faster than libstdc++ or libc++ when
decoding integers).
If you're using Ubuntu (or a similar distro), you can install bencode from the following PPA: ppa:jimporter/stable. If you're not using Ubuntu, you can also build from source using bfg9000. Just run the following:
$ cd /path/to/bencode.hpp/
$ 9k build/
$ cd build/
$ ninja install
However, since bencode.hpp is a single-file, header-only library, you can just
copy include/bencode.hpp
to your destination of choice. (Note that doing this
won't generate a bencodehpp.pc
file for pkg-config
to use.)
Bencode has four data types: integer
, string
, list
, and dict
. These
correspond to long long
, std::string
, std::vector<bencode::data>
, and
std::map<std::string, bencode::data>
, respectively. Since the data types are
determined at runtime, these are all stored in a variant type called data
(a
subclass of std::variant
).
Note: Technically, bencode::dict
is a map_proxy
object, since std::map
doesn't support holding elements of incomplete type (though some implementations
do allow this). This type has all the member functions you'd expect, as well as
overloaded *
and ->
operators to access the proxied std::map
directly.
However, you can customize this if you like.
Decoding bencoded data is simple. Just call decode
with a string or some other
container holding character data. This will return a data
object that you can
operate on:
bencode::data data = bencode::decode("i42e");
auto value = std::get<bencode::integer>(data);
decode
also has overloads that takes an iterator pair or a pointer and length:
auto data1 = bencode::decode(foo.begin(), foo.end());
auto data2 = bencode::decode(c_str, std::strlen(c_str));
Finally, you can pass an std::istream
directly to decode
. By default, this
overload will set the eof bit on the stream if it reaches the end. However, you
can override this behavior:
// Defaults to bencode::check_eof.
auto data = bencode::decode(stream, bencode::no_check_eof);
This option is useful if, for instance, you're reading multiple bencoded messages from a pipe, which brings us to...
One convenient feature of bencoded data is that it's possible to concatenate
successive objects in the same string or stream, and readers can always tell
where one ends and the next begins. While decode
will consume all the input
(or throw an exception if there's any extraneous data), decode_some
will let
you parse just the next bencoded object, leaving any extra data for the next
call:
std::stringstream input("i42e3:foo");
auto data1 = bencode::decode_some(input); // contains 42
auto data2 = bencode::decode_some(input); // contains "foo"
When calling decode_some
with an iterator pair, it will update the value of
the "begin" iterator in-place to point to where the parsing left off. Similary,
calling decode_some
with a pointer or pointer/length, it will update the
pointer's value in-place.
If the buffer holding the bencoded data is stable (i.e. won't change or be
destroyed until you're done working with the parsed representation), you can
decode the data as a view on the buffer to save memory. This results in all
parsed strings being nothing more than pointers pointing to slices of your
buffer. Simply add _view
to the functions/types to take advantage of this:
std::string buf = "3:foo";
bencode::data_view data = bencode::decode_view(buf); // or `decode_view_some`
auto value = std::get<bencode::string_view>(data);
If there's an error trying to decode some bencode data, a decode_error
will be
thrown. This provides information about where the error occurred via the
offset()
member function, as well as access to the underlying exception that
caused the error, via either nested_ptr()
or rethrow_nested()
:
try {
auto data = bencode::decode(input);
} catch(const bencode::decode_error &e) {
// Throw the underlying exception. Maybe catch it and do something with it.
e.rethrow_nested();
}
Once you have a data
(or data_view
) object, it's easy to read from it. For
simple cases, you can just use std::get
to retrieve the value out of the
variant:
auto data = bencode::decode("i42e");
auto value = std::get<bencode::integer>(data);
In addition, you can use the operator []
or at
member functions to get the
requested element from a list
value (if you pass an integer) or dict
value
(if you pass a string):
auto data = bencode::decode("d3:fooi42ee");
auto elem = data["foo"];
auto value = std::get<bencode::integer>(elem);
These member functions simply forward on to the corresponding functions for the underlying container, and are (roughly) equivalent to:
auto elem = std::get<bencode::dict>(data)["foo"];
Since bencode::data
type is simply a subclass of std::variant
(likewise
bencode::data_view
), you can usually just call std::visit
on it.
Unfortunately, due to a quirk in the C++ specification
(resolved in C++23), not all standard libraries support passing bencode::data
to std::visit
. To get around this issue, you can call the base()
method to
cast bencode::data
to a std::variant
:
std::visit(visitor_fn, my_data.base());
Encoding data is also straightforward:
// Encode and store the result in an std::string.
auto str = bencode::encode(42);
// Encode and output to an std::ostream.
bencode::encode_to(std::cout, 42);
// Encode and output to an iterator.
std::vector<char> vec;
bencode::encode_to(std::back_inserter(vec), 42);
You can also construct more-complex data structures:
bencode::encode_to(std::cout, bencode::dict{
{"one", 1},
{"two", bencode::list{1, "foo", 2}},
{"three", "3"}
});
As with encoding, you can use the *_view
types if you know the underlying
memory will live until the encoding function returns.
If Boost is installed, bencode.hpp will provide functions to decode data into a
boost::variant
. This can be particularly useful for some data sets, since
boost::variant
is consistently faster than most std::variant
implementations, especially when storing integers.
These functions work the same as the regular bencode.hpp versions, but are
prefixed with boost_
:
bencode::boost_data d = bencode::boost_decode(msg);
bencode::boost_data_view dv = bencode::boost_decode_view(msg);
// ...
In addition to using the built-in data types bencode::data
and
bencode::data_view
, you can define your own with the bencode::basic_data
class template. This can be useful if you want different alternative types in
your variant (e.g. using std::map
instead of bencode::map_proxy
if your
standard library supports that) or to use a different variant type altogether:
using cool_data = bencode::basic_data<
cool_variant, long long, std::string, std::vector, bencode::map_proxy
>;
auto result = bencode::basic_decode<cool_data>(message);
Note that when using a different variant type, you'll likely want to create a
specialization of bencode::variant_traits
so that bencode.hpp knows how to
call the visitor function for your type:
template<>
struct bencode::variant_traits<cool_variant> {
template<typename Visitor, typename ...Variants>
static decltype(auto) visit(Visitor &&visitor, Variants &&...variants) {
return cool_visit(std::forward<Visitor>(visitor),
std::forward<Variants>(variants).base()...);
}
template<typename Type, typename Variant>
inline static decltype(auto) get_if(Variant *variant) {
return cool_get_if<Type>(&variant->base());
}
template<typename Variant>
inline static auto index(const Variant &variant) {
return variant.cool_index();
}
};
This library is licensed under the BSD 3-Clause license.