Python implementation of Bluesky PDS and AT Protocol, including data repository, Merkle search tree, and XRPC methods.
You can build your own PDS on top of arroba with just a few lines of Python and run it in any WSGI server. You can build a more involved PDS with custom logic and behavior. Or you can build a different ATProto service, eg an AppView, relay (née BGS), or something entirely new!
Install from PyPI with pip install arroba
.
Arroba is the Spanish word for the @ character ("at sign").
License: This project is placed in the public domain. You may also use it under the CC0 License.
Here's minimal example code for a multi-repo PDS on top of arroba and Flask:
from flask import Flask
from google.cloud import ndb
from lexrpc.flask_server import init_flask
from arroba import server
from arroba.datastore_storage import DatastoreStorage
from arroba.xrpc_sync import send_events
# for Google Cloud Datastore
ndb_client = ndb.Client()
server.storage = DatastoreStorage(ndb_client=ndb_client)
server.repo.callback = lambda _: send_events() # to subscribeRepos
app = Flask('my-pds')
init_flask(server.server, app)
def ndb_context_middleware(wsgi_app):
def wrapper(environ, start_response):
with ndb_client.context():
return wsgi_app(environ, start_response)
return wrapper
app.wsgi_app = ndb_context_middleware(app.wsgi_app)
See app.py
for a more comprehensive example, including a CORS handler for OPTIONS
preflight requests and a catch-all app.bsky.*
XRPC handler that proxies requests to the AppView.
Arroba consists of these parts:
- Data structures:
- Storage:
Storage
abstract base classDatastoreStorage
(uses Google Cloud Datastore)- TODO: filesystem storage
- XRPC handlers:
- Utilities:
did
: create and resolvedid:plc
s,did:web
s, and domain handlesdiff
: find the deterministic minimal difference between twoMST
sutil
: miscellaneous utilities for TIDs, AT URIs, signing and verifying signatures, generating JWTs, encoding/decoding, and more
Configure arroba with these environment variables:
APPVIEW_HOST
, defaultapi.bsky-sandbox.dev
RELAY_HOST
, defaultbgs.bsky-sandbox.dev
PLC_HOST
, defaultplc.bsky-sandbox.dev
PDS_HOST
, where you're running your PDS
Optional, only used in com.atproto.repo, .server, and .sync XRPC handlers:
REPO_TOKEN
, static token to use as bothaccessJwt
andrefreshJwt
, defaults to contents ofrepo_token
file. Not required to be an actual JWT. If not set, XRPC methods that require auth will return HTTP 501 Not Implemented.ROLLBACK_WINDOW
, number of events to serve in thesubscribeRepos
rollback window, as an integer. Defaults to no limit.SUBSCRIBE_REPOS_BATCH_DELAY
, minimum time to wait between datastore queries incom.atproto.sync.subscribeRepos
, in seconds, as a float. Defaults to 0 if unset.
Breaking changes:
repo
:apply_commit
,apply_writes
: raise an exception if the repo is inactive.
storage
:load_repo
: don't raise an exception if the repo is tombstoned.
util
:- Rename
TombstonedRepo
toInactiveRepo
.
- Rename
Non-breaking changes:
datastore_storage
:DatastoreStorage
:- Add new
ndb_context_kwargs
constructor kwarg. apply_commit
: handle deactivated repos.create_repo
: propagateRepo.status
intoAtpRepo
.
- Add new
AtpRemoteBlob
:- Add
width
andheight
properties, populated for images, to be used in image embedaspectRatio
(snarfed/bridgy-fed#1571).
- Add
xrpc_repo
:describe_server
: include allapp.bsky
collections and others likechat.bsky.actor.declaration
; fetch and include DID doc.
Breaking changes:
- Add much more lexicon schema validation for records and XRPC method input, output, and parameters.
storage
:- Switch
Storage.write
to returnBlock
instead ofCID
.
- Switch
Non-breaking changes:
did
:- Add new
update_plc
method. create_plc
: add newalso_known_as
kwarg.resolve_handle
: dropContent-Type: text/plain
requirement for HTTPS method.
- Add new
mst
:- Add new optional
start
kwarg toload_all
.
- Add new optional
repo
:- Emit new #identity and #account events to
subscribeRepos
when creating new repos.
- Emit new #identity and #account events to
storage
:- Add new
deactivate_repo
,activate_repo
, andwrite_event
methods. - Add new optional
repo
kwarg toread_blocks_by_seq
andread_events_by_seq
to limit returned results to a single repo.
- Add new
datastore_storage
:- Add new
max_size
andaccept_types
kwarg toAtpRemoteBlob.get_or_create
for the blob'smaxSize
andaccept
parameters in its lexicon. If the fetched file doesn't satisfy those constraints, raiseslexrpc.ValidationError.
DatastoreStorage.read_blocks_by_seq
: use strong consistency for datastore query. May fix occasionalAssertionError
when servingsubscribeRepos
.
- Add new
xrpc_sync
:- Switch
getBlob
from returning HTTP 302 to 301. - Implement
since
param ingetRepo
. subscribeRepos
: wait up to 60s on a skipped sequence number before giving up and emitting it as a gap.
- Switch
util
:service_jwt
: add new**claims
parameter for additional JWT claims, eglxm
.
Breaking changes:
datastore_storage
:DatastoreStorage
: add new requiredndb_client
kwarg to constructor, used to get new context in lexrpc websocket subscription handlers that run server methods likesubscribeRepos
in separate threads (snarfed/lexrpc#8).DatastoreStorage.read_blocks_by_seq
: if the ndb context gets closed while we're still running, log a warning and return. (This can happen in egflask_server
if the websocket client disconnects early.)AtpRemoteBlob
: if the blob URL doesn't return theContent-Type
header, infer type from the URL, or fall back toapplication/octet-stream
(bridgy-fed#1073).
did
:- Cache
resolve_plc
,resolve_web
, andresolve_handle
for 6h, up to 5000 total results per call.
- Cache
storage
: renameStorage.read_commits_by_seq
toread_events_by_seq
for new account tombstone support.xrpc_sync
: renamesend_new_commits
tosend_events
, ditto.xrpc_repo
: stop requiring auth for read methods:getRecord
,listRecords
,describeRepo
.
Non-breaking changes:
did
:- Add
HANDLE_RE
regexp for handle validation.
- Add
storage
:- Add new
Storage.tombstone_repo
method, implemented inMemoryStorage
andDatastoreStorage
. Used to delete accounts. (bridgy-fed#783) - Add new
Storage.load_repos
method, implemented inMemoryStorage
andDatastoreStorage
. Used forcom.atproto.sync.listRepos
.
- Add new
util
:service_jwt
: add optionalaud
kwarg.
xrpc_sync
:subscribeRepos
:- Add support for non-commit events, starting with account tombstones.
- Add
ROLLBACK_WINDOW
environment variable to limit size of rollback window. Defaults to no limit. - For commits with create or update operations, always include the record block, even if it already existed in the repo beforehand (snarfed/bridgy-fed#1016).
- Bug fix, populate the time each commit was created in
time
instead of the current time (snarfed/bridgy-fed#1015).
- Start serving
getRepo
queries with thesince
parameter.since
still isn't actually implemented, but we now serve the entire repo instead of returning an error. - Implement
getRepoStatus
method. - Implement
listRepos
method. getRepo
bug fix: include the repo head commit block.
xrpc_repo
:getRecord
: encoded returned records correctly as ATProto-flavored DAG-JSON.
xrpc_*
: returnRepoNotFound
andRepoDeactivated
errors when appropriate (snarfed/bridgy-fed#1083).
- Bug fix: base32-encode TIDs in record keys,
at://
URIs, commitrev
s, etc. Before, we were using the integer UNIX timestamp directly, which happened to be the same 13 character length. Oops. - Switch from
BGS_HOST
environment variable toRELAY_HOST
.BGS_HOST
is still supported for backward compatibility. datastore_storage
:- Bug fix for
DatastoreStorage.last_seq
, handle new NSID. - Add new
AtpRemoteBlob
class for storing "remote" blobs, available at public HTTP URLs, that we don't store ourselves.
- Bug fix for
did
:create_plc
: strip padding from genesis operation signature (for did-method-plc#54, atproto#1839).resolve_handle
: return None on bad domain, eg.foo.com
.resolve_handle
bug fix: handlecharset
specifier in HTTPS method responseContent-Type
.
util
:new_key
: addseed
kwarg to allow deterministic key generation.
xrpc_repo
:getRecord
: try to load record locally first; if not available, forward to AppView.
xrpc_sync
:- Implement
getBlob
, right now only based on "remote" blobs stored inAtpRemoteBlob
s in datastore storage.
- Implement
- Migrate to ATProto repo v3. Specifically, the existing
subscribeRepos
sequence number is reused as the newrev
field in commits. (Discussion.). - Add new
did
module with utilities to create and resolvedid:plc
s and resolvedid:web
s. - Add new
util.service_jwt
function that generates ATProto inter-service JWTs. Repo
:- Add new
signing_key
/rotation_key
attributes. Generate store, and load both indatastore_storage
. - Remove
format_init_commit
, migrate existing calls toformat_commit
.
- Add new
Storage
:- Rename
read_from_seq
=>read_blocks_by_seq
(and inMemoryStorage
andDatastoreStorage
), add newread_commits_by_seq
method. - Merge
load_repo
did
/handle
kwargs intodid_or_handle
.
- Rename
- XRPCs:
- Make
subscribeRepos
check storage for all new commits every time it wakes up.- As part of this, replace
xrpc_sync.enqueue_commit
with newsend_new_commits
function that takes no parameters.
- As part of this, replace
- Drop bundled
app.bsky
/com.atproto
lexicons, use lexrpc's instead.
- Make
Big milestone: arroba is successfully federating with the ATProto sandbox! See app.py for the minimal demo code needed to wrap arroba in a fully functional PDS.
- Add Google Cloud Datastore implementation of repo storage.
- Implement
com.atproto
XRPC methods needed to federate with sandbox, including most ofrepo
andsync
.- Notably, includes
subscribeRepos
server side over websocket.
- Notably, includes
- ...and much more.
Implement repo and commit chain in new Repo class, including pluggable storage. This completes the first pass at all PDS data structures. Next release will include initial implementations of the com.atproto.sync.*
XRPC methods.
Initial release! Still very in progress. MST, Walker, and Diff classes are mostly complete and working. Repo, commits, and sync XRPC methods are still in progress.
Here's how to package, test, and ship a new release.
-
Run the unit tests.
source local/bin/activate.csh python -m unittest discover python -m unittest arroba.tests.mst_test_suite # more extensive, slower tests (deliberately excluded from autodiscovery)
-
Bump the version number in
pyproject.toml
anddocs/conf.py
.git grep
the old version number to make sure it only appears in the changelog. Change the current changelog entry inREADME.md
for this new version from unreleased to the current date. -
Build the docs. If you added any new modules, add them to the appropriate file(s) in
docs/source/
. Then run./docs/build.sh
. Check that the generated HTML looks fine by openingdocs/_build/html/index.html
and looking around. -
setenv ver X.Y git commit -am "release v$ver"
-
Upload to test.pypi.org for testing.
python -m build twine upload -r pypitest dist/arroba-$ver*
-
Install from test.pypi.org.
cd /tmp python -m venv local source local/bin/activate.csh # make sure we force pip to use the uploaded version pip uninstall arroba pip install --upgrade pip pip install -i https://test.pypi.org/simple --extra-index-url https://pypi.org/simple arroba==$ver deactivate
-
Smoke test that the code trivially loads and runs.
source local/bin/activate.csh python from arroba import did did.resolve_handle('snarfed.org') deactivate
-
Tag the release in git. In the tag message editor, delete the generated comments at bottom, leave the first line blank (to omit the release "title" in github), put
### Notable changes
on the second line, then copy and paste this version's changelog contents below it.git tag -a v$ver --cleanup=verbatim git push && git push --tags
-
Click here to draft a new release on GitHub. Enter
vX.Y
in the Tag version box. Leave Release title empty. Copy### Notable changes
and the changelog contents into the description text box. -
Upload to pypi.org!
twine upload dist/arroba-$ver*
-
Wait for the docs to build on Read the Docs, then check that they look ok.
-
On the Versions page, check that the new version is active, If it's not, activate it in the Activate a Version section.