-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(processing_engine): initial implementation of Processing Engine plugins and triggers #25639
Conversation
schema = { workspace = true } | ||
|
||
[dependencies.pyo3] | ||
version = "0.23.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving a quick comment that the choice of https://github.com/PyO3/pyo3 seems reasonable from a security and maintenance perspective. It is an active project with lots of users (23.4k) and contributors (355). It has a CVE history:
- https://www.cve.org/CVERecord?id=CVE-2020-35917 (fixed in 0.12.4)
- https://www.cve.org/CVERecord?id=CVE-2024-9979 (fixed in 0.22.4; we're using 0.23.3)
PyO3/pyo3#4590 (for CVE-2024-9979) demonstrates that the project cares about its users and they did a 0.22.4 release that mitigates the security issue even though there were plans in the works for 0.23 to remove the problematic code that was the context of the vulnerability.
https://github.com/PyO3/pyo3 lists a number of popular projects that use pyo3. RedHat is interested in the project (they reported CVE-2024-9979) and a cursory search show they have products that use it: https://access.redhat.com/security/cve/cve-2024-9979. Debian has it packaged. All of this adds up to this bodes well for continued support.
It is also properly hooked into the GitHub advisory database: https://github.com/advisories?query=pyo3 so we'll see issues in pyo3 itself via dependabot alerts.
When the time is right, we should discuss where we are getting the embedded python interpreter that is provided to pyo3 (depend on the system? will we build it ourselves? grab it from somewhere official? etc) as there is a security maintenance angle here, but we don't have to discuss that now. Depending on the choice, we'll have to track security issues in the embedded interpreter specially (since dependabot can't help with that). This is tractable.
As an fyi, might want to be aware of PyO3/pyo3#4265 which is pyo3 dealing with the non-GIL python versions/builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to depend on the system. We want the interpreter included in the InfluxDB binary so that the user has no other local system setup to do other than install InfluxDB. I'm guessing we'll use PyOxidizer for this, although I'm not sure it's fully supported yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to depend on the system
Right. This makes perfect sense. Not trying to distract from this PR, but we'll definitely want to understand the maintenance story of the interpreter itself that we ship as it will require security support. I'm unfamiliar with PyOxidizer, but (unlike pyo3), it isn't widely used and doesn't seem terribly active (doesn't mean we can't choose it; just means there is a risk associated with the project going away). On the other hand, it is a proper crate and so dependabot alerts/etc we'd just get.
Other options exist: eg, we could choose to build the official cpython ourselves. That could be from upstream python but an alternative would be taking a linux distro's sources but building it the way we want; if we chose wisely (eg, Ubuntu 24.04 LTS's source deb or perhaps a newer Rocky Linux srpm) we'd get years of 'free' upgrades on a stable base (distros stay on a particular upstream version and backport patches; whenever a security notice was issued, we'd simply refresh our sources on the new Ubuntu source deb/Rocky srpm/etc).
I'm not saying we should do that: pyoxidizer exists for a reason after all, but this type of thing is probably worth entertaining even if in the end we reject it. (It could also be our escape hatch if pyoxidizer goes away).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unfamiliar with PyOxidizer, but (unlike pyo3), it isn't widely used and doesn't seem terribly active (doesn't mean we can't choose it; just means there is a risk associated with the project going away). On the other hand, it is a proper crate and so dependabot alerts/etc we'd just get.
TLDR: +1 from security on pyo3+pyoxidyzer. pyoxidizer seems like a reasonable choice to start with. It provides builds (or we can choose to build based on its python-build-standalone
support) for the architectures we want and it solves a lot of problems that we'd encounter if we built our own.
I looked into this a bit and it appears (please correct me!) that PyOxidizer is a glue layer that allows you to choose from a list available python distributions, of which it lists CPython (upstream python) 3.8.15, 3.9.15 and 3.10.8. This list is based on what https://github.com/indygreg/python-build-standalone supports, and it seems we can choose to build from source or to download a pre-built release from https://github.com/indygreg/python-build-standalone/releases.
Official python lists 3.8's EOL as 2024-10-07, 3.9 as 2025-10 and 3.10 as 2026-10. Official python lists 3.8's EOL as 2024-10-07, 3.9 as 2025-10 and 3.10 as 2026-10. 3.10.16 is the latest 3.10 release (and there are security fixes since 3.10.8). 3.9.21 is the latest 3.9 release (and there are security fixes since 3.9.15). 3.8.20 is the latest 3.8 release (and there are security fixes since 3.8.15).
At first glance it seems like the project is trailing behind in both micro and minor versions, however, the aforementioned https://pyoxidizer.readthedocs.io is out of date and if we compare https://github.com/indygreg/python-build-standalone/releases with https://www.python.org/downloads/, pyoxidizer is closely following upstream with up to date CPython builds from upstream. Eg, https://github.com/indygreg/python-build-standalone/releases/tag/20241205 says:
# Breaking
* Drop support for Python 3.8: The [20241008 release](https://github.com/indygreg/python-build-standalone/releases/tag/20241008) was the last release with Python 3.8 distributions. Support has now been removed from the build process.
* Rename python3.13t.exe to python.exe in Windows free-threaded distributions: This matches the other Windows distributions. See https://github.com/astral-sh/uv/issues/8298 for discussion.
# Upgrades
* CPython 3.9.20 -> 3.9.21
* CPython 3.10.15 -> 3.10.16
* CPython 3.11.10 -> 3.11.11
* CPython 3.12.7 -> 3.12.8
* CPython 3.13.0 -> 3.13.1
...
Assuming pyoxidizer does what we want and we're comfortable with the health of the project (perhaps we'd want to invest in the project through code contributions or maintenance at some point?), then it seems like a fine choice.
A more reasonable escape hatch to the one I mentioned before if the project goes away is likely to fork the project and update python-build-standalone
/etc to follow upstream python and to build the specific new version we want to target.
Depending on the choice, we'll have to track security issues in the embedded interpreter specially (since dependabot can't help with that). This is tractable.
If we choose pyoxidizer, we aren't going to get dependabot alerts and PRs automatically for the runtime itself (we would for the rust crates of course). This means if we choose this approach, our CVE tracking will necessarily need to follow upstream python and pyoxidizer releases. This is all fine: if this is the chosen path, I'll simply follow upstream python security announcements and then file issues/triage for our products in the normal way.
d2ec446
to
e12a5cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments and questions, but overall looks like a good starting point!
pub struct ProcessingEnginePlugin { | ||
pub plugin_name: String, | ||
pub code: String, | ||
pub function_name: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why have this and plugin name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the call site for the python code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning that this is the Python function that Rust calls to hand over the rows to the Python plugin? If so, I think we should pick a static name and have that be part of the plugin authoring process. Your plugin always has that function defined. Convention over configuration 😄
@@ -525,6 +527,28 @@ pub struct MetaCacheDelete { | |||
pub cache_name: Arc<str>, | |||
} | |||
|
|||
#[derive(Debug, Serialize, Deserialize, Eq, PartialEq, Clone)] | |||
pub struct PluginDefinition { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these types all seem the same as the ones above? Can we just have one set of types and share them between the WAL and the plugin crate?
Another question I forgot to ask, what is installing the maturin crate for? |
The maturin crate is for building the influxdb3_py_api crate as a Python package so that plugin code can call next_point() to iterate. |
…plugins and triggers
a8a73ec
to
bd93d20
Compare
…nd system-py feature flag
bd93d20
to
b216b6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start, let's GOOOOOOOOOO
This introduces Processing Engine plugins and triggers. Initially I've implemented one type of plugin: python functions that will be passed an iterator over the influxdb3_Client_py Point class. For triggers, users specify the trigger name, plugin name, trigger specification (what it should trigger off of).
Local Development
For local development, follow the following steps:
python3 -m venv myenv
.pip install influxdb3-python
.maturin
withcargo install maturin
.maturin develop -E "myenv"
.cargo build --quick-release
.PYTHONPATH=$PYENV_DIR/lib/python3.13/site-packages ./target/quick-release/influxdb3 serve --object-store file --data-dir /Users/USERNAME/influx-data --host-id $MY_HOST_ID
"load_test"
.measurement_data
:This is initial work, much more to be added.