Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom User Functions #72

Closed
nathanielc opened this issue Dec 1, 2015 · 5 comments
Closed

Custom User Functions #72

nathanielc opened this issue Dec 1, 2015 · 5 comments
Assignees

Comments

@nathanielc
Copy link
Contributor

Right now the only functions that can be applied to the data stream are the ones available in the InfluxQL language or can be expressed via lambda expressions. We need a way to allow users to define their own functions without having to compile them into Kapacitor.

Basic Plan so far:

  • Use RPC and external processes.
  • The RPC API will have basic Collect/Emit calls to stream data in and out off the process.
  • Initial candidate for RPC is using gRPC
  • Snapshot state periodically so that processes can resume where they left off after restart.

Questions:

  • How to configure such a custom function and expose it via TICKscript, Maybe separate lookup for custom chaining methods based off registered custom functions.
  • Which language do we want to support, the choice of RPC system will greatly effect this. Maybe we want a fast robust system like gRPC and a simple JSON one for other languages.
@nathanielc
Copy link
Contributor Author

I have a working prototype using gRPC. Initial thoughts:

  • gRPC is very new and so installation and use is difficult since it requires building lots of pre-released code from source.
  • There are not very many languages supported yet. Key languages are the scientific languages. C, Python, R, maybe even Fortran. gRPC says its focus is on mobile, which does not align with our needs. This is an issue yet but it means integration with scientific languages is not likely to happen.
  • The streaming is very nice. Specially since the bi-directional streams are independent and the order of reads vs writes doesn't matter.

@nathanielc nathanielc self-assigned this Dec 9, 2015
@skyrocknroll
Copy link

How about adding support for lua runtime in kapacitor?
If we add then the users can define their functions in lua .
Heka https://hekad.readthedocs.org/en/v0.9.2/ allow user to write custom filter in lua and load it to run time. Heka also written in Go.

https://github.com/mozilla-services/lua_sandbox lua snabox is opensoure.

@nathanielc
Copy link
Contributor Author

@skyrocknroll we thought a lot about using lua initially and it would be able to solve a piece of the problem, but we also decided that it would not solve enough.

The goal of custom functions is not necessarily so that user can write their own functions but rather can use their existing algorithms/code for processing their data. Many users will already have their algorithms already written whatever language C/C++ Python etc and many of them will not be able to rewrite the code effectively in any other language. By creating a method for input and output of Kapacitor users can just create wrappers as needed.

Having multiple language support is important enough that we decided away from Lua.

@nathanielc
Copy link
Contributor Author

I finished a survey of serialization formats and RPC frameworks. Here are my thoughts:

Serialization:
There are many serialization formats out there and protobufs are one of the most popular. While C/C++, Java, and Python are the only languages officially builtin to the compiler, support for most other languages exists. See https://github.com/google/protobuf/wiki/Third-Party-Add-ons#programming-languages
Protobuf 3 also provides a JSON serialization along with its binary format. This way we can support any language that can also read JSON. This makes it also easier for users to write their own simple functions if they are not overly concerned about performance, without having to consume protobuf libraries. For these reasons we have decided to go with protobuf version 3. Version 3 is still in beta but as Kapacitor is young this should not be an issue.

RPC:
Protobuf provides an RPC system via service declarations that allows methods to be defined and then through protoc plugins RPC code can be generated for the services. Currently gRPC is the main framework that takes advantage of the RPC system in proto3. But gRPC is limited in the languages is supports and adding language support is non-trivial. Both the complexity of gRPC code generation and limited language support made us decide not to use gRPC.
Proto3's RPC system can be used without gRPC but it requires effectively writing your own version of gRPC. While a simplified socket based gRPC system seems beneficial it would require that we would have to write a protoc plugin to generate code for every language we wanted to support.

Finally protobuf messages can easily be streamed over a socket and communication can be handled directly via sockets without using a RPC framework. This is the approach that we like the best so far. Our needs for communicating with the process are not complex. Basic heartbeat and snapshot messages plus the actual data flow. Using STDIN and STDOUT to send/receive messages from a process is simple and effective. This way all that needs to be done to support a language is to write a code to read and write protobuf messages to a socket. This is trivial code to write if it doesn't exist already for a given language. Writing code for a given language is much simpler and easier to maintain than a code generation program. Since using the protobuf RPC framework requires writing a code generation program we decide not to use it and just use protobuf messages directly and write our own clients/servers in the desired languages.

In summary simplicity wins out for more widespread support and the lower barrier to entry for writing your own custom functions. See this WIP PR for details of the initial proto message definitions #99 .

@skyrocknroll
Copy link

Thanks

nathanielc added a commit that referenced this issue Jun 9, 2016
…47a777

bbd5bb6 Make struct decoding also handle empty Primitives
66416ff Decode empty Primitives into nullable values successfully
5b80cc5 Clean up slice decoding handling
75869ce Unify two switch arms
1946733 Properly encode struct fields having toml tags without a name
0e5f512 Don't treat non-empty strings of whitespace as empty for omitempty
e27e134 add bool empty option
a4eecd4 Remove extra lexer advance
0c4ce10 In Decode, reuse slices when possible
dacf173 Merge remote-tracking branch 'yourkarma/master'
166915e Merge pull request #82 from mjibson/fix-decode-omit
3e3bd42 Don't panic when failing to parse a timestamp
001f7af Fix no-op utf-8 validity test
2678c1e Add tests for ignored fields
2fe0945 Flesh out anonymous field encoding
4cc516a Merge remote-tracking branch 'shawnps/gofmt'
f772cd8 Merge pull request #112 from stapelberg/inaccessible-go1.6
77ccfcd Bugfix: update check for inaccessible fields for Go 1.6
312db06 Merge pull request #93 from bep/parse-panic
782628a gofmt -s
5c4df71 Merge pull request #108 from kezhuw/fix_endless_loop
c3bcd45 Fix endless loop in table name lexing
851e5be Panic instead of os.Exit for illegal state situations in parser
110f954 Make new destination slice when length doesn't match.
54c24c1 Use correct name during decode with omit options
056c9bc Merge pull request #81 from bbuck/omitempty
aa708eb Clean up, remove zero as 'empty' and add 'omitzero' option
d918309 Support for omitempty, as well as tests for omitempty.
443a628 Merge pull request #72 from binary132/fix-readme
9baf8a8 Updated link for TOML v0.2.0
f706d00 Support quoted keys.
7eda3e2 Remove escape for '/'.
0f9db13 Forbid '#' in table names.
a6db6cf Simplify lexer for Unicode escapes and add support for `\U`.
3644d30 Fix typo. Thanks @ChrisHines
32ee81d Various formatting fixes. 80 cols.
0eaa740 Fix #66.
3883ac1 Merge pull request #59 from fromonesrc/patch-1
237e946 Merge pull request #57 from gisakulabs/UnmarshalTOML
b2c5eb4 Merge pull request #61 from halostatue/multiline
1956abe Implement multiline strings and raw multiline strings.
73199af Support single-line raw strings.
71fac5b Fixed comment typo
ac8879e Fix readme typo on Decode method
67ade19 Modified the `Unmarshaler` interface to `.UnmarshalTOML(v interface{})`
bc95534 Added support for UnmarshalTOML() interface.

git-subtree-dir: vendor/github.com/BurntSushi/toml
git-subtree-split: 747a77770ca4730759d5944e3a7fe869d452648b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants