Storage layer: initial `marketstore` tsdb support with async OHLCV history loading. #308

goodboy · 2022-05-08T16:43:00Z

Replaces #247 and #305 (merging history from both) and instead adds a more formal "storage layer" for retrieving and storing large ohlcv series from all major backends. The work here is best experienced with the new incremental update patchset from #302 in order to see the new graphics performance improvements at work though this should work moderately well as is.

TL;DR:

adds a marketstore docker container supervisor actor which allows spawning by passing pikerd --tsdb -> this closes Supervise the store #143
- this command must be run as root, but the root perms will be dropped (in pikerd) shortly after marketstored (the container super) is started
adds working async with open_history_client() support to ib, binance, kraken which allows pulling in and storing a large amount of history in the tsdb, marketstore.
- this API expects the async cntx mngr endpoint to deliver an awaitable that can be called with an input datetime range, new special exception signals can be used (NoData, DataUnavailable) to signal that the backend can't deliver data for a given range (or at all).
async "batch" fetching of OHLC history using trimeter in the backends that can handle it (currently only binance).
- includes a history "frame generator" system which delivers datetime ranges that are passed to the trimeter request scheduler and which dynamically adjusts the request-time index when a gap is detected.
adds a piker.data.marketstore.Storage api/layer which allows async, high level operations. the intention is to eventually have this layer support more tsdb providers like arctic, techtonicdb. Currently the only backend is marketstore with client-side operations implemented using our anyio-marketstore library:
- loading / reading existing ohlc time series by fqsn with appropriate request size limiting with .read_ohlcv()
- writing ohlcv series by fqsn with appropriate limits with .write_ohlcv()
- deleting time series entries via Storage.delete_ts()

What this does (yet) not introduce:

real-time ingest from tick feeds to marketstore
- this was originally planned but the shorter path to get graphics downsampling methods up and running was to first start with OHLC history ingest and display
a full storesh interactive repl for managing the tsdb, there is a minimal ipython embedding at the moment but it is nowhere complete and we need a follow up task-issue to finish this.

TODO:

._ahab.py supervisor:
- probably deliver back net-socket info over the ctx.started() call?
  - grprc socket
  - ws addr
- figure out what to do with the mkts.yaml config?
  - we could push a template from code to the user dir?
  - or should we just always gen it from python?
- cli support for running pikerd with the tsdb stuff spawned?
  - maybe a --tsdb or --data or something?
- step by step pikerd --tsdb test list:
  - pikerd --tsdb should raise DockerNotStarted and appropriate perms error on no sudo (for now)
  - ctrl-c should kill container instance
general marketstore config and operation:
- should we always push newly received history from backends which is not yet in the tsdb to it?
  - we could also just offer this as a config option? eventually the UI should offer manual controls for such things..

to be done as #313

what docs should we offer regarding saving / deleting history?

follow up (to be written in new task-issues and implemented in coming PRs)

moved to #314

tick ingest support and an accompanying feed-style inter-actor API to pull feeds from ingestor re-broadcast system(s):
- tick ingest to marketstore from brokerd feeds and experiment with techtonicdb schema (some tinkering was already done in this patchset by @guilledk but is unfinished).
- tick-to-ohlcv sampling if it can be done with the aggregator plugin (got a feeling we'll need to write at least a 1Sec bucket in order for this to work looking at the code:
  - aggregate() calls into model.FromTrades() and FromTrades() only accepts a min step of 1Sec
  - we might be able to either write a golang plugin/extension to do it from pure tick data or we can make this part of a piker actor?

these moved to #312

we need a timeseries diffing and syncing system to validate new captured histories from real-time runs (which almost always are slightly different then the history provided by the providers dbs 🙄) as well as for catching history mis-writes / gaps which need to be edited / corrected when bugs
a REPL (with ipython) that allows interaction, edit, and general management of the tsdb for both the purposes of research and just plain old data mgmt.

goodboy · 2022-05-08T21:11:14Z

piker/data/cli.py


-    tractor.run(main)
+# @cli.command()


ahh yeah we can probably drop this.

the way i'd like to deal with "tsdb management" is the new storesh repl (with embedded ipython) which will have a small interactive API for manual db tinkering.

actually gonna leave it commented in here just in case we decide std cli cmds for this is handy.

goodboy · 2022-05-08T21:11:51Z

piker/data/_source.py

@@ -254,61 +254,6 @@ def iterfqsns(self) -> list[str]:
        return keys


-def from_df(


ahh yeah right!

we can drop this (and maybe pandas altogether for now) since i rewrote the ib ohlc frame parser to just cast directly to numpy.

piker/data/_ahab.py

piker/fsp/_volume.py

piker/data/marketstore.py

goodboy · 2022-05-09T14:31:40Z

Ok so my plan forward on this is to try and wrap the final few TODOs and make issues/tasks for all the follow up stuff.

…from marketstore

…unt usage

…oblem

Bleh/:facepalm:, the ``end_dt`` in scope is not the "earliest" frame's `end_dt` in the async response queue.. Parse the queue's latest epoch and use **that** to compare to the last last pushed datetime index.. Add more detailed logging to help debug any (un)expected datetime index gaps.

It seems once in a while a frame can get missed or dropped (at least with binance?) so in those cases, when the request erlangs is already at max, we just manually request the missing frame and presume things will work out XD Further, discard out of order frames that are "from the future" that somehow end up in the async queue once in a while? Not sure why this happens but it seems thus far just discarding them is nbd.

We return a copy (since since a view doesn't seem to work..) of the (field filtered) shm array contents which is the same index-length as the source data. Further, fence off the resource tracker disable-hack into a helper routine.

goodboy · 2022-05-10T22:03:12Z

Pushed a few more piker.data._sharedmem changes from #302 which fixed some shm pushing / slicing edge cases.

goodboy changed the base branch from master to marketstore May 8, 2022 16:43

goodboy mentioned this pull request May 8, 2022

Crypto$ bars: lotsa 1m for ze pumpbois #305

Closed

goodboy changed the base branch from marketstore to master May 8, 2022 18:03

goodboy mentioned this pull request May 8, 2022

marketstore OHCLV storage and client integration #247

Closed

15 tasks

goodboy changed the title ~~Storage layer~~ Storage layer: initial marketstore tsdb support with async OHLCV history loading. May 8, 2022

goodboy commented May 8, 2022

View reviewed changes

piker/data/_ahab.py Show resolved Hide resolved

goodboy commented May 8, 2022

View reviewed changes

piker/fsp/_volume.py Show resolved Hide resolved

goodboy commented May 8, 2022

View reviewed changes

piker/data/marketstore.py Show resolved Hide resolved

Guillermo Rodriguez and others added 19 commits May 9, 2022 11:15

Add multi ingestor support and update to new feed API

3c09bfb

Simplify and optimize tick format, similar to techtonicdb's

897a5cf

Still WIP, switch to using new marketstore client, missing streaming …

943b025

…from marketstore

Basic module-script for spawning marketstore, needs correct bind mo…

aca3ca8

…unt usage

Extract non-sudo user for config dir path

1cdb943

Add a super simple marketstore container supervisor

fbd3d1e

Drop import, it's got madness with and SIGINT?

9203ebe

Drop old client instantiate line

ec41354

Type annot updates

7d2e9bf

Add explicit no-docker error and supervisor start task-func

faa5a78

Handle the non-root perms case specifically too

aecc597

De-escalate sudo perms in pikerd once docker spawns

7395b56

Add --tsdb flag to start marketstore with pikerd

facc86f

Py3.9+ type updates

ed5bae0

Drop ununsed Services ref

970393b

Add back in OHLCV dtype template and client side ws streamer

8047714

Add back in legacy write loop for reference

445b822

Better handle nested erros from docker client

4bcc301

Add WIP backfiller from data feed helper

56fa759

goodboy added 6 commits May 9, 2022 11:15

Include epoch timestamp in quote label for now

0324404

Fix less-then-frame off by one slice, add db write toggle and disable

c9a621f

Don't offset the start index by a step

1676bce

Drop unneeded/commented cancel-by-msg code; roots perms wasn't the pr…

fb9b699

…oblem

Add comment about un-reffed vars meant for use in shell

8e08fb7

Handle gaps greater then a frame within a frame

30ddf63

goodboy force-pushed the storage_layer branch from 61ef588 to 30ddf63 Compare May 9, 2022 15:15

goodboy added 7 commits May 10, 2022 14:55

Bring binance requests down to 3/sec; seems faster?

4b6ecbf

Always write missing history frames to tsdb (again)

277ca29

Parametrize and deliver (relevant) mkts config in start_ahab()

9ddfae4

Factor marketstore container specifics into piker.data.marketstore

e196e9d

Write mkts.yml from template if one dne

769e803

Better formatted startup logging output

083a329

guilledk approved these changes May 10, 2022

View reviewed changes

goodboy added 6 commits May 10, 2022 17:25

Raise error on 'fatal' and 'error' log levels

b124644

Double up shm buffer size

8219307

Add support for no ._first.value update shm prepends

09431aa

Only assert if input array actually has a size

b3f9c4f

This was referenced May 11, 2022

tsdb editing and mgmt toolz #312

Open

Supervise the store #143

Closed

Document --tsdb in readme #313

Open

marketstore tick ingest #314

Open

Can we drop pandas? #316

Closed

goodboy merged commit 482fc1d into master May 11, 2022

goodboy deleted the storage_layer branch May 11, 2022 14:18

goodboy mentioned this pull request May 11, 2022

Add trimeter dep.. that we forgot #318

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storage layer: initial `marketstore` tsdb support with async OHLCV history loading. #308

Storage layer: initial `marketstore` tsdb support with async OHLCV history loading. #308

goodboy commented May 8, 2022 •

edited

Loading

goodboy May 8, 2022

goodboy May 11, 2022

goodboy May 8, 2022

goodboy commented May 9, 2022

goodboy commented May 10, 2022

		@@ -254,61 +254,6 @@ def iterfqsns(self) -> list[str]:
		return keys


		def from_df(

Storage layer: initial marketstore tsdb support with async OHLCV history loading. #308

Storage layer: initial marketstore tsdb support with async OHLCV history loading. #308

Conversation

goodboy commented May 8, 2022 • edited Loading

TL;DR:

What this does (yet) not introduce:

TODO:

follow up (to be written in new task-issues and implemented in coming PRs)

goodboy May 8, 2022

Choose a reason for hiding this comment

goodboy May 11, 2022

Choose a reason for hiding this comment

goodboy May 8, 2022

Choose a reason for hiding this comment

goodboy commented May 9, 2022

goodboy commented May 10, 2022

Storage layer: initial `marketstore` tsdb support with async OHLCV history loading. #308

Storage layer: initial `marketstore` tsdb support with async OHLCV history loading. #308

goodboy commented May 8, 2022 •

edited

Loading