SkyStreamer is an AT Firehose consumer that streams new posts from Bluesky. It provides simplified wrapper data types and a streaming API for easy filtering of events on the AT Firehose, based on ATrium.
SkyStreamer is part of Project Zenith, an experimental social media analytics project by Fyra Labs.
The AT protocol firehose can be a very useful source of data, one may want to consume it in a more structured way. SkyStreamer helps filter out the noise and only stream new Bluesky posts from the firehose.
- Make data types non-dependent on SurrealDB while maintaining data types
- Export a crate for easy integration with other projects
- SSL-independent implementation (Rustls/OpenSSL agnostic)
- Optimized, multiple-threaded implementation
- Configuration
SkyStreamer collects large amounts of new data streaming from Bluesky itself, which then can be used for many various purposes.
While the network and the data itself is visibly public to everyone, Some users may be uncomfortable with their data being collected and used in this way, especially for machine learning or similar purposes.
Important
Please be mindful of the data you are collecting and how you are using it, and respect their consent if possible.
SkyStreamer comes with a streaming API that can be used alongside Rust's iterator API, which can be used to filter out data. However, it does not filter or redact any data by default, you will be responsible for your own data processing.
Fyra Labs is not responsible for any misuse of the data collected by SkyStreamer. You are responsible for your own actions.
SkyStreamer can be used as a library or as a CLI. The library provides a Tokio stream that can be used to stream data from the AT Firehose.
By default, SkyStreamer will stream its data into a SurrealDB database. You may change this using environment variables or CLI arguments.
You may also stream the data into an IETF RFC 4180 CSV-formatted file, or log it into the console.
skystreamer -E csv -o data.csv
See skystreamer --help
for more information.
Please see the skystreamer/examples
directory for examples on how to use SkyStreamer as a library.
SkyStreamer also has a Prometheus exporter implementation that can be used to show statistics about Bluesky posts as a whole.
See skystreamer-prometheus-exporter for the implentation.
You can also use the Docker image from ghcr.io/fyralabs/skystreamer-prometheus-exporter:latest
to run the exporter. Configure it like you would
normally configure Prometheus data sources.
The data itself only includes numbers of posts and various counters per metric, and does not include any actual post data except for external link domains and hashtags.
MAX_SAMPLE_SIZE
: Maximum number of posts to count before overflow, default is none (Maxu64
).NORMALIZE_LANGS
: Attempt to normalize post language codes to their respective IETF BCP 47 codes, default istrue
. Set tofalse
to export raw codes from the AT firehose.
SkyStreamer was originally implemented as a simple Python script. You can find the old implementation in the legacy
directory.
SkyStreamer is licensed under the MIT License. See the LICENSE
file for more information.
This software is provided as-is, without any warranty or guarantee of any kind. Use at your own risk. Fyra Labs is not responsible for any misuse or damage caused by this software.