Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the message layout #28

Closed
smyrman opened this issue Apr 27, 2021 · 5 comments
Closed

Change the message layout #28

smyrman opened this issue Apr 27, 2021 · 5 comments

Comments

@smyrman
Copy link
Contributor

smyrman commented Apr 27, 2021

UPDATED.

The final proposal is that we should change the message format so that we end up with:

msg:
  topic: "<Input ID>"
  payload:
    times: ["<timestmap>", ...]
    values: [(<number>||null), ...]
  signal: <Signal> # Match https://docs.clarify.us/reference#signal

The Input ID is put in msg.topic to allow simpler topic based rate-limiting or down-sampling flows using standard rbe or delay blocs.

The signal meta data is moved out of the payload to:

  1. Align better with API reference documentation.
  2. Allow separation of meta-data and data into different flows.
  3. Allow easier detection of weather meta-data is present.

msg.payload.series is renamed to msg.payload.values to better distinguish the payload format from the API v1 Data Frame format.

It is further proposed that the Advanced Settings section where you can remap the message fields is removed, fully committing to the final format. Remapping is either way going to be an easy thing to do via a Change or Function block when needed.


Original text (deprecated)

Problems to solve

There are a few problems we can address with the current message design.

Meta-data and data in same parameter

Everything (data and metadata) is combined in the payload (by default). This makes it hard both mentally and programmatically to check if meta-data is provided or not. It also makes it more complicated to provide metadata data for the cases where all signal meta-data is statically configured.

Drift between Node-RED format and RPC documentation

The format is similar to, but still not exactly equal to the Clarify JSON RPC API. Making the formats more equal where it's practical, could make documentation and understanding easier.

Main issues:

  • Because meta-data and data is combined in the same structure, we can not directly refer to the the "Signal Info" type in the RPC API documentation.
  • The data object has the same field names as the Clarify JSON RPC API "Data Frame", but only the content of the "times" field match exactly. The "series" field is a map of Input ID to an array of values in RPC, while it's an array of values in Node-RED.

Note on the current design

In the current design, We expect each message arriving the insert block to hold only one signal. While this makes the Data Frame format different in Node-RED than in RPC, it has clear benefits when creating flows, because it becomes easier to manipulate / alter each individual message (e.g. by inserting meta-data).

There could still be use-cases for allowing multiple signals per message, like allowing aggregation of multiple signals through a set of standard blocks. That is, if we have a use-case for doing this inside Node-RED, and we expect numbers that can be used to perform the aggregation to arrive in the same message. Multi-signal messages will on the other hand complicate single-signal manipulations because each of them involve either a map traversal or key lookup.

Allowing multiple data formats likely isn't a good idea, because this makes any attempt of later adding aggregate blocks near impossible. Therefore, we should either stick with single-signal messages, or migrate to multi-signal messages.

Next version message design

Update: Remove the original proposal of adding a separate "format" and "insert" block. A format block would likely either be too complex or not complex enough. It seams better to let this be done by custom functions, which are unavoidable in practice anyways.

Main goal:

  • Ease documentation by letting input be as similar as possible to the RPC API input.
  • Ease usage by keeping data and meta-data separate.

Insert message design

Below are a few alternatives for a new message format. This can either be enforced, or done by changing the default values in the current advanced options section.

Alt 1: Single signal

msg:
  input: "<Input ID>"
  payload:
    times: ["<timestmap>", ...]
    values: [(<number>||null), ...]
  signal: <Signal Info> // Exact match with RPC Signal Info object

Achieves:

  • Allow single signal flows (easy data / meta-data manipulation).
  • Align signal field to exactly match Signal Info RPC docs.
  • Rename series to values to better highlight the difference from the Data Frame's series field in the RPC docs.

Variation: rename "input" field to "topic".

A possible variation of this format is one where we replace the "input" key with the key "topic". This is because the "topic" filed is treated specially by some blocks in particular, such as the rate limiter (a.k.a. "delay") block. There may be more blocks that treats "topic" differently, and we should investigate if there are any good use-cases where having the Input ID in the topic would be useful.

Relevant questions:

  • Is there a real use-case to rate-limit (and optionally drop intermediate messages) based on Input ID?
  • Are there any other blocks that threats "topic" in a way where enforcing it to contain an Input ID has a benefit?

Docs on using topic: https://nodered.org/docs/developing-flows/message-design#using-msgtopic

Alt 2: Multi signal

msg:
  payload: <Data Frame> // Exact match with the RPC Data Frame object.
    times: ["<RFC 3339 timestamp>", ...]
    series:
      <Input ID>: [(<number>||null), ...]
  inputs: // Exact match RPC saveSignals method's `inputs` parameter.
    <Input ID>:  <Signal Info> // Exact match with RPC Signal Info object.

Achieves:

  • Allow multi signal flows (aggregations/calculations based on multiple signals).
  • Align inputs field to exactly match integration.saveSignals RPC method's inputs parameter.
  • Align payload field to exactly match integration.insert RPC method's data parameter.

Note on backwards compatibility

How much we need to care about backwards compatibility for this change depends on how much this plugin is used. We can probably get some numbers on this on the Clarify side. It also depends on when we do this. E.g. do we do it before the v1, for v2, or as a v1.x feature.

Here are some options for backwards compatibility:

  1. Enforce new schema (no backwards compatibility).
  2. Enforce new schema (no config), but allow dragging in a message translator block to translate a "v1" default message to a "v2" message.
  3. Change default Advanced Settings of current blocks. Existing blocks will keep working, but if you drag in a new block, it will get the new config and you may need/want to either alter your message format or your Advanced Config.
  4. Keep the current block as "v1", and add a completely new "v2" block for the new message format 😢.
@smyrman smyrman added the proposal Something that is up for discussion. label Apr 27, 2021
@smyrman smyrman changed the title Itterate on message desing Itterate on block desing & messge layout Apr 27, 2021
@smyrman
Copy link
Contributor Author

smyrman commented May 1, 2021

Actuallt, the rbe block is a potential good case for placing inputId in the topic!

image

The block allows filtering out messages when there are no change to the payload (or in our case, in the payload.value when reading one and one data point from the source). If set up to be per topic, this method can be used to filter messages where the data doesn't change from one point to another.

@smyrman
Copy link
Contributor Author

smyrman commented May 1, 2021

A potential useful flow combining the "rbe" and "delay" blocks.
image
image
image

Flow:

[{"id":"d3afa196.7767","type":"tab","label":"Flow 1","disabled":false,"info":""},{"id":"41dc38bf.fc24f8","type":"rbe","z":"d3afa196.7767","name":"block no change","func":"rbe","gap":"","start":"","inout":"out","septopics":true,"property":"payload.values","x":510,"y":160,"wires":[["1bd47c45.d109e4"]]},{"id":"230f9127.9ced7e","type":"delay","z":"d3afa196.7767","name":"avoid gap (1h)","pauseType":"queue","timeout":"5","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"hour","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":true,"x":500,"y":200,"wires":[["1bd47c45.d109e4"]]},{"id":"af2a5c3.bf55da","type":"inject","z":"d3afa196.7767","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"1","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":130,"y":180,"wires":[["d908ba49.751178"]]},{"id":"d908ba49.751178","type":"function","z":"d3afa196.7767","name":"data source","func":"\nconst time = new Date(msg.payload);\nconst data = 1;\n\nreturn {\n    \"topic\": \"input-id\",\n    \"payload\": {\n        \"times\": [time.toISOString()],\n        \"values\": [data],\n    },\n    \"signal\": {\n        \"gapDetectoin\": \"PT1H1S\",\n    }\n}","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"x":300,"y":180,"wires":[["41dc38bf.fc24f8","230f9127.9ced7e"]]},{"id":"1bd47c45.d109e4","type":"debug","z":"d3afa196.7767","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","statusVal":"","statusType":"auto","x":710,"y":180,"wires":[]}]

@smyrman
Copy link
Contributor Author

smyrman commented May 1, 2021

So with that, I think I will recommend Alt1 with the Input ID in the topic:

Alt 1: Single signal (topic variant)

msg:
  topic: "<Input ID>"
  payload:
    times: ["<timestmap>", ...]
    values: [(<number>||null), ...]
  signal: <Signal> // Exact match with RPC Signal object from the reference docs.

I won't deny there could be interesting use-cases for Alt2, but I am currently quite happy with the idea of prioritizing the single signal flows. This means prioritizing flows that helps get (the right amount) of data into Clarify, and then see if the problem of aggregation can perhaps be solved within Clarify itself.

@smyrman smyrman changed the title Itterate on block desing & messge layout Change the message layout May 20, 2021
@smyrman
Copy link
Contributor Author

smyrman commented May 20, 2021

Updated proposal text to match previous comment.

@smyrman smyrman removed the proposal Something that is up for discussion. label Jun 28, 2021
bbergshaven added a commit that referenced this issue Jun 29, 2021
Updated the format of the messages according to this proposal: #28

The Input ID is put in msg.topic
The signal meta data is moved out of the payload to msg.signal
msg.payload.data.times is renamed/moved to msg.payload.times
msg.payload.data.series is renamed/moved to msg.payload.values
New message format:

msg:
  topic: "<Input ID>"
  payload:
    times: ["<timestamp>", ...]
    values: [(<number>||null), ...]
  signal: <Signal> // Match https://docs.clarify.us/reference#signal
@bbergshaven
Copy link
Collaborator

Implemented Alt1. Will look into supporting Dataframe in later releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants