TravisCI:
AppVeyor:
BitShuffle is a program for encoding and decoding arbitrary binary data into
printable ASCII characters for transfer over arbitrary media. In many respects,
it can fill the same purpose as base64
or uudecode
/ uuencode
,
however it is more sophisticated than these tools. Some key features that
BitShuffle offers include:
- Automatic chunking of data into arbitrary sizes.
- Automatic checksumming of data
- Automatic compression of data (bzip and gzip are both support, see #2)
- Support for both Python 2 and 3
- The use case which spawned the project in the first place; copying small
files over an existing interactive
ssh
session without needing to re-authenticate when usingscp
. - Transferring arbitrary files over chat programs which either don't allow attachments, or which restrict what file types are allowed. For example sending a small script to a friend over GroupMe.
- Embedding arbitrary binary data in program logs (as an example, one of BitShuffle's authors once used a spiritual precursor to BitShuffle to pickle and embed live Python objects into a program's debug log for later interactive debugging).
- Sending e-mail attachments across e-mail servers that don't allow certain file types/extensions (email attachments are really just base64 encoded data anyway, but BitShuffle would avoid inspection by most mail services).
These services are inconvenient to use for very small or transient files; i.e. "let me show you this cool shell script I wrote", or "here look at this 10 line long log file".
These services are designed specifically for transferring plain text data, and often mangle binary data. They usually have size limitation as well.
The authors of BitShuffle find it useful. Maybe you will too. Maybe not.
The amount of automated tests may seem high for a project as small as
BitShuffle is. However, BitShuffle is intended to be a tool used on a daily
basis (as it is by its authors), inside of pipelines, and possibly inside of
other automation. It is critical thus that it not break or behave in strange or
unusual ways for the same reason ls
needs to not break on weird edge cases
- it's used too frequently.
Yes, but please wait until we have a stable release. The data packet format may change without warning until there is at least one stable release.
Not at this time, but it will in the future as the project matures a bit. Until then use BitShuffle as a Python module at your own risk.
To install/run BitShuffle:
- POSIX-ish operating system, or Windows (as of #38).
- Python (>= 2.7)
To run BitShuffle's automated tests locally:
- POSIX
sh
compliant shell interpreter uuidgen
travis
(hint:gem install travis
)bc
/tmp
must exist and be write-ablepycodestyle
Simply run python ./setup.py install
.
(Note: this assumes which python
is identical to python
)
If you are only going to be using BitShuffle as a script, not as a python
module, you can also just drop bitshuffle/bitshuffle.py
into $PATH
(I
suggest symlinking to ~/bin/bitshuffle
).
Binary releases for various platforms are available via the GitHub releases
page. At present, builds are available for Linux and Windows as static
binaries, which can be dropped anywhere in $PATH
without requiring Python
to be installed.
macOS ships with Python installed in the default install, and the version available thus is sufficient to run BitShuffle. Consequentially, no static build is provided for macOS at this time.
Contributions are welcome! Simply open a GitHub
pull request. All contributions
need to pass the automated TravisCI checks, most of which are available as
a script
(I recommend symlinking scripts/pre-commit
to .git/hooks/
).
If you would like to contribute by sending patches over e-mail, that is fine too, just get in touch with @charlesdaniels.
A BitShuffle data packet is a sequence of ASCII text. A data packet may be arbitrarily long. A data backed may contain arbitrary whitespace, which is stripped during processing.
A BitShuffle packet is surrounded by special sigil characters:
- It is preceeded by the string literal
((<<
(opening token) - It is succeeded by the string literal
>>))
(closing token)
These string literals are deliberately selected to avoid common markup
characters, such as #
, @
, and *
, which are frequency used by
messaging services to denote special formatting for messages.
The data packed is comprised of several segments. A segment begins with
either the opening token or the |
character. A segment ends with either the
closing token or a |
character. A segment may contain only the characters
a-zA-Z0-9
, as well as =
, :
, /
, +
, -
. Again, keep in mind that
whitespace is ignored entirely.
The data packed contains the following segments, in order:
- Message indicating that this a BitShuffle data packet, with a link to download BitShuffle. Note that the decoder does not support line breaks in this segment (see #10).
- BitShuffle data packet format compatibility level (currently
1
). - BitShuffle data encoding format (current
base64
). - BitShuffle data compression type (currently either
bz2
orgzip
). - BitShuffle packet sequence number (i.e. 23).
- BitShuffle packet sequence end (the number of packets in the message).
- BitShuffle data checksum (encoded)
- BitShuffle data chunk (encoded)
Segments marked as encoded indicate their contents is arbitrary data which has been compressed with the specified compression type, and encoded with the specified encoding format.
Note that the data packet spec is liable to change without warning in non-release versions of BitShuffle. Any changes made since the last release will result in a compatibility level bump at time of release. Use non-release versions at your own risk.
BitShuffle is tested automatically by multiple CI systems (AppVeyor and
TravisCI), executing a large battery of tests to ensure it is functioning
correctly. These scripts are implemented in POSIX sh
, and are stored int
the scripts/
directory. A subset of these tests that are safe to run
locally (do not modify the disk or require sudo
) can be executed with the
script scripts/pre_commit_check.sh
. For convenience, only one version of
python is tested locally. Contributors should not open PRs for code that
does not pass this script.
Note that Windows support is tested via a PowerShell script, which is intended to run only on AppVeyor. It executes only a few very simple smoke tests that ensure the program can run successfully on Windows, but does not exhaustively test every feature.
Most of BitShuffle's tests are end-to-end/blackbox tests that aim to validate real-world use cases. At this time, BitShuffle is too small and monolithic for actual unit tests to be of value. In the future, a stable public API will be defined, at which time comprehensive unit tests will need to be written to avoid regressions (see #39, #5).
In addition to automated functionality tests, we also adhere strictly to PEP8, which is enforced by pycodestyle.
BitShuffle loosely follows Semantic Versioning. The following suffixes are used:
- No suffix - implies this is a stable release.
-git
- this version is from the BitShuffle git repository, and probably has not been tested.-RCX
- the is the Xth release candidate for the relevant version.