Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

init version of online serving and rolling #290

Merged
merged 77 commits into from
May 17, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
04b916c
safe yaml loader
you-n-g Feb 16, 2021
83237ba
yml afe load
you-n-g Feb 17, 2021
1e5cf1c
init version of online serving and rolling
you-n-g Feb 26, 2021
24024d5
qlib auto init basedon project & black format
you-n-g Feb 27, 2021
c4733f6
Merge pull request #1 from you-n-g/online_srv
lzh222333 Mar 2, 2021
b84156f
Consider more situations about task_config.
lzh222333 Mar 3, 2021
05cf0e1
add task_generator method and update some hint
lzh222333 Mar 3, 2021
fd2c1ba
Update some hint
lzh222333 Mar 3, 2021
2882929
Add an example about workflow using RollingGen.
lzh222333 Mar 3, 2021
a244f87
modified the comments
lzh222333 Mar 8, 2021
def132e
modified format and added TaskCollector
lzh222333 Mar 8, 2021
83dbdfb
finished document and example
lzh222333 Mar 9, 2021
e2f5827
update task manager
lzh222333 Mar 10, 2021
2ca2071
format code
lzh222333 Mar 10, 2021
48f0fc1
first version of online serving
lzh222333 Mar 11, 2021
0df88c0
bug fixed and update collect.py
lzh222333 Mar 11, 2021
44a7dc0
update docs and fix duplicated pred bug
you-n-g Mar 12, 2021
5de7870
Merge branch 'online_srv' of github.com:you-n-g/qlib into online_srv
you-n-g Mar 12, 2021
6d8aa21
the second version of online serving
lzh222333 Mar 12, 2021
9d84d38
format code and add example
lzh222333 Mar 12, 2021
e4e8a4a
fix task name & add cur_path
you-n-g Mar 12, 2021
8362780
fix import bug
you-n-g Mar 14, 2021
646d899
update docstring and document
lzh222333 Mar 15, 2021
0bc49da
add task management to index.rst
lzh222333 Mar 15, 2021
e3730b3
more clearly structure
lzh222333 Mar 16, 2021
5953365
finished update_online_pred demo
lzh222333 Mar 16, 2021
d33041d
format example
lzh222333 Mar 16, 2021
8abdd63
online_serving V3
lzh222333 Mar 18, 2021
84d5318
Merge branch 'online_srv_wd' into online_srv
you-n-g Mar 19, 2021
d66d4ec
Merge branch 'main' into online_srv
lzh222333 Mar 23, 2021
46cd576
Online Serving V4
lzh222333 Mar 26, 2021
9bf819e
Merge branch 'online_srv' of https://github.com/you-n-g/qlib into onl…
lzh222333 Mar 26, 2021
ee45a78
Merge branch 'main' into online_srv
lzh222333 Mar 26, 2021
1f2d2c9
online debug
lzh222333 Mar 30, 2021
eae94d1
Merge remote-tracking branch 'microsoft/qlib/main' into online_srv
lzh222333 Mar 30, 2021
544365f
ensemble & get_exp & dataset_pickle
lzh222333 Mar 31, 2021
3724273
Merge remote-tracking branch 'microsoft/qlib/main' into online_srv
lzh222333 Mar 31, 2021
edcd7b1
bug fixed & code format
lzh222333 Mar 31, 2021
bd7a1c1
trainer & group & collect & ensemble
lzh222333 Apr 2, 2021
431a9c9
online serving v5
lzh222333 Apr 2, 2021
cb42e99
bug fixed & examples fire
lzh222333 Apr 7, 2021
1dbb561
Fix some API(for lb nn)
you-n-g Apr 7, 2021
7160579
Merge branch 'online_srv_wd' into online_srv
you-n-g Apr 7, 2021
c20eb5c
format code
lzh222333 Apr 8, 2021
18bf4b5
parameter adjustment
you-n-g Apr 8, 2021
a366c11
Update features for hyb nn
you-n-g Apr 9, 2021
cca43cf
Refactor update & modification when running NN
you-n-g Apr 11, 2021
b15e5e3
Fix the multi-processing bug
you-n-g Apr 12, 2021
5095b2a
simulator & examples
lzh222333 Apr 13, 2021
cec318f
online serving V7
lzh222333 Apr 16, 2021
de0a0c0
bug fixed
lzh222333 Apr 22, 2021
319396c
online serving V8
lzh222333 Apr 25, 2021
0058f7d
Online Serving V8
lzh222333 Apr 26, 2021
42f5100
update collector
lzh222333 Apr 27, 2021
36ab078
filter
Apr 28, 2021
45c6dfc
filter
Apr 28, 2021
fa4511c
filter
Apr 28, 2021
40cf83e
online serving V9 middle status
lzh222333 Apr 28, 2021
6f66934
Merge branch 'online_srv' of https://github.com/you-n-g/qlib into onl…
lzh222333 Apr 28, 2021
67c5740
OnlineServing V9
lzh222333 Apr 29, 2021
2b7ffa1
Merge remote-tracking branch 'microsoft/main' into online_srv
lzh222333 Apr 29, 2021
1c99fb3
Merge remote-tracking branch 'microsoft/main' into online_srv
lzh222333 May 6, 2021
84c56f1
docs and bug fixed
lzh222333 May 6, 2021
846c64f
fix param
binlins May 6, 2021
9dfd001
online serving v10
lzh222333 May 7, 2021
bec65dd
add document and reindex
binlins May 7, 2021
08edb92
add flt_data doc
binlins May 7, 2021
060a32e
Merge branch 'online_srv' into online_srv_blin
you-n-g May 7, 2021
1c605e5
Merge pull request #14 from you-n-g/online_srv_blin
you-n-g May 7, 2021
4c23261
Merge branch 'online_srv' of https://github.com/you-n-g/qlib into onl…
lzh222333 May 9, 2021
f5ded06
Merge remote-tracking branch 'microsoft/main' into online_srv
lzh222333 May 9, 2021
370b6aa
logger & doc
lzh222333 May 9, 2021
d71a666
Online serving V11
lzh222333 May 13, 2021
ebd01e0
Online Serving V11
lzh222333 May 14, 2021
aef3f18
format code
lzh222333 May 14, 2021
a986379
bug fixed
lzh222333 May 14, 2021
8c3a08b
Finally!
lzh222333 May 17, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/advanced/serial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ Serializable Class

``Qlib`` provides a base class ``qlib.utils.serial.Serializable``, whose state can be dumped into or loaded from disk in `pickle` format.
When users dump the state of a ``Serializable`` instance, the attributes of the instance whose name **does not** start with `_` will be saved on the disk.
However, users can use ``config`` method or override ``default_dump_all`` attribute to prevent this feature.

Users can also override ``pickle_backend`` attribute to choose a pickle backend. The supported value is "pickle" (default and common) and "dill" (dump more things such as function, more information in `here <https://pypi.org/project/dill/>`_).

Example
==========================
Expand Down
27 changes: 15 additions & 12 deletions docs/advanced/task_management.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ An example of the entire process is shown `here <https://github.com/microsoft/ql

Task Generating
===============
A ``task`` consists of `Model`, `Dataset`, `Record` or anything added by users.
A ``task`` consists of `Model`, `Dataset`, `Record`, or anything added by users.
The specific task template can be viewed in
`Task Section <../component/workflow.html#task-section>`_.
Even though the task template is fixed, users can customize their ``TaskGen`` to generate different ``task`` by task template.
Expand All @@ -30,15 +30,15 @@ Here is the base class of ``TaskGen``:
:members:

``Qlib`` provides a class `RollingGen <https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/gen.py>`_ to generate a list of ``task`` of the dataset in different date segments.
This class allows users to verify the effect of data from different periods on the model in one experiment. More information in `here <../reference/api.html#TaskGen>`_.
This class allows users to verify the effect of data from different periods on the model in one experiment. More information is `here <../reference/api.html#TaskGen>`_.

Task Storing
===============
To achieve higher efficiency and the possibility of cluster operation, ``Task Manager`` will store all tasks in `MongoDB <https://www.mongodb.com/>`_.
``TaskManager`` can fetch undone tasks automatically and manage the lifecycle of a set of tasks with error handling.
Users **MUST** finished the configuration of `MongoDB <https://www.mongodb.com/>`_ when using this module.
Users **MUST** finish the configuration of `MongoDB <https://www.mongodb.com/>`_ when using this module.

Users need to provide the MongoDB URL and database name for using ``TaskManager`` in `initialization <../start/initialization.html#Parameters>`_ or make statement like this.
Users need to provide the MongoDB URL and database name for using ``TaskManager`` in `initialization <../start/initialization.html#Parameters>`_ or make a statement like this.

.. code-block:: python

Expand All @@ -55,32 +55,35 @@ More information of ``Task Manager`` can be found in `here <../reference/api.htm

Task Training
===============
#FIXME: Trainer
After generating and storing those ``task``, it's time to run the ``task`` which are in the *WAITING* status.
After generating and storing those ``task``, it's time to run the ``task`` which is in the *WAITING* status.
``Qlib`` provides a method called ``run_task`` to run those ``task`` in task pool, however, users can also customize how tasks are executed.
An easy way to get the ``task_func`` is using ``qlib.model.trainer.task_train`` directly.
It will run the whole workflow defined by ``task``, which includes *Model*, *Dataset*, *Record*.

.. autofunction:: qlib.workflow.task.manage.run_task

Meanwhile, ``Qlib`` provides a module called ``Trainer``.
``Trainer`` will train a list of tasks and return a list of model recorder.
``Qlib`` offer two kind of Trainer, TrainerR is the simplest way and TrainerRM is based on TaskManager to help manager tasks lifecycle automatically.

.. autoclass:: qlib.model.trainer.Trainer
:members:

``Trainer`` will train a list of tasks and return a list of model recorders.
``Qlib`` offer two kinds of Trainer, TrainerR is the simplest way and TrainerRM is based on TaskManager to help manager tasks lifecycle automatically.
If you do not want to use ``Task Manager`` to manage tasks, then use TrainerR to train a list of tasks generated by ``TaskGen`` is enough.
More information is in `here <../reference/api.html#Trainer>`_.
`Here <../reference/api.html#Trainer>`_ are the details about different ``Trainer``.

Task Collecting
===============
To collect the results of ``task`` after training, ``Qlib`` provides `Collector <../reference/api.html#Collector>`_, `Group <../reference/api.html#Group>`_ and `Ensemble <../reference/api.html#Ensemble>`_ to collect the results in a readable, expandable and loosely-coupled way.

`Collector <../reference/api.html#Collector>`_ can collect object from everywhere and process them such as merging, grouping, averaging and so on. It has 2 step action including ``collect`` (collect anything in a dict) and ``process_collect`` (process collected dict).
`Collector <../reference/api.html#Collector>`_ can collect objects from everywhere and process them such as merging, grouping, averaging and so on. It has 2 step action including ``collect`` (collect anything in a dict) and ``process_collect`` (process collected dict).

`Group <../reference/api.html#Group>`_ also has 2 steps including ``group`` (can group a set of object based on `group_func` and change them to a dict) and ``reduce`` (can make a dict become an ensemble based on some rule).
For example: {(A,B,C1): object, (A,B,C2): object} ---``group``---> {(A,B): {C1: object, C2: object}} ---``reduce``---> {(A,B): object}

`Ensemble <../reference/api.html#Ensemble>`_ can merge the objects in an ensemble.
For example: {C1: object, C2: object} ---``Ensemble``---> object

So the hierarchy is ``Collector``'s second step correspond to ``Group``. And ``Group``'s second step correspond to ``Ensemble``.
So the hierarchy is ``Collector``'s second step corresponds to ``Group``. And ``Group``'s second step correspond to ``Ensemble``.

For more information, please see `Collector <../reference/api.html#Collector>`_, `Group <../reference/api.html#Group>`_ and `Ensemble <../reference/api.html#Ensemble>`_, or the `example <https://github.com/microsoft/qlib/tree/main/examples/model_rolling/task_manager_rolling.py>`_
For more information, please see `Collector <../reference/api.html#Collector>`_, `Group <../reference/api.html#Group>`_ and `Ensemble <../reference/api.html#Ensemble>`_, or the `example <https://github.com/microsoft/qlib/tree/main/examples/model_rolling/task_manager_rolling.py>`_.
6 changes: 3 additions & 3 deletions docs/component/online.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@ Online Serving
Introduction
=============
In addition to backtesting, one way to test a model is effective is to make predictions in real market conditions or even do real trading based on those predictions.
``Online Serving`` is a set of module for online models using latest data,
``Online Serving`` is a set of modules for online models using the latest data,
which including `Online Manager <#Online Manager>`_, `Online Strategy <#Online Strategy>`_, `Online Tool <#Online Tool>`_, `Updater <#Updater>`_.

`Here <https://github.com/microsoft/qlib/tree/main/examples/online_srv>`_ are several examples for reference, which demonstrate different features of ``Online Serving``.
If you have many models or `task` need to be managed, please consider `Task Management <../advanced/task_management.html>`_.
The `examples <https://github.com/microsoft/qlib/tree/main/examples/online_srv>`_ maybe based on `Task Management <../advanced/task_management.html>`_ such as ``TrainerRM`` or ``Collector``.
If you have many models or `task` needs to be managed, please consider `Task Management <../advanced/task_management.html>`_.
The `examples <https://github.com/microsoft/qlib/tree/main/examples/online_srv>`_ are based on some components in `Task Management <../advanced/task_management.html>`_ such as ``TrainerRM`` or ``Collector``.

Online Manager
=============
Expand Down
5 changes: 4 additions & 1 deletion docs/reference/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -226,4 +226,7 @@ Serializable
--------------------

.. automodule:: qlib.utils.serial.Serializable
:members:
:members:



4 changes: 2 additions & 2 deletions examples/model_rolling/task_manager_rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
# Licensed under the MIT License.

"""
This example shows how a TrainerRM work based on TaskManager with rolling tasks.
After training, how to collect the rolling results will be showed in task_collecting.
This example shows how a TrainerRM works based on TaskManager with rolling tasks.
After training, how to collect the rolling results will be shown in task_collecting.
"""

from pprint import pprint
Expand Down
14 changes: 6 additions & 8 deletions examples/online_srv/online_management_simulate.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@
# Licensed under the MIT License.

"""
This examples is about how can simulate the OnlineManager based on rolling tasks.
This example is about how can simulate the OnlineManager based on rolling tasks.
"""

import fire
import qlib
from qlib.model.trainer import DelayTrainerRM
from qlib.model.trainer import DelayTrainerR, DelayTrainerRM, TrainerR, TrainerRM
from qlib.workflow import R
from qlib.workflow.online.manager import OnlineManager
from qlib.workflow.online.strategy import RollingAverageStrategy
from qlib.workflow.online.strategy import RollingStrategy
from qlib.workflow.task.gen import RollingGen
from qlib.workflow.task.manage import TaskManager

Expand Down Expand Up @@ -112,10 +112,10 @@ def __init__(
qlib.init(provider_uri=provider_uri, region=region, mongo=mongo_conf)
self.rolling_gen = RollingGen(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RollingGen is a legacy implementation.
We should think about if it is the best practice under the new design.

step=rolling_step, rtype=RollingGen.ROLL_SD, ds_extra_mod_func=None
) # The rolling tasks generator, ds_extra_mod_func is None because we just need simulate to 2018-10-31 and needn't change handler end time.
self.trainer = DelayTrainerRM(self.exp_name, self.task_pool)
) # The rolling tasks generator, ds_extra_mod_func is None because we just need to simulate to 2018-10-31 and needn't change the handler end time.
self.trainer = DelayTrainerRM(self.exp_name, self.task_pool) # Also can be TrainerR, TrainerRM, DelayTrainerR
self.rolling_online_manager = OnlineManager(
RollingAverageStrategy(exp_name, task_template=tasks, rolling_gen=self.rolling_gen),
RollingStrategy(exp_name, task_template=tasks, rolling_gen=self.rolling_gen),
trainer=self.trainer,
begin_time=self.start_time,
)
Expand All @@ -138,8 +138,6 @@ def main(self):
print(self.rolling_online_manager.get_collector()())
print("========== signals ==========")
print(self.rolling_online_manager.get_signals())
print("========== online history ==========")
print(self.rolling_online_manager.history)


if __name__ == "__main__":
Expand Down
66 changes: 43 additions & 23 deletions examples/online_srv/rolling_online_management.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,20 @@
# Licensed under the MIT License.

"""
This example show how OnlineManager works with rolling tasks.
There are two parts including first train and routine.
This example shows how OnlineManager works with rolling tasks.
There are four parts including first train, routine 1, add strategy and routine 2.
Firstly, the OnlineManager will finish the first training and set trained models to `online` models.
Next, the OnlineManager will finish a routine process, including update online prediction -> prepare signals -> prepare tasks -> prepare new models -> reset online models
Next, the OnlineManager will finish a routine process, including update online prediction -> prepare tasks -> prepare new models -> prepare signals
Then, we will add some new strategies to the OnlineManager. This will finish first training of new strategies.
Finally, the OnlineManager will finish second routine and update all strategies.
"""

import os
from pathlib import Path
import pickle
import fire
import qlib
from qlib.workflow import R
from qlib.workflow.online.strategy import RollingAverageStrategy
from qlib.workflow.online.strategy import RollingStrategy
from qlib.workflow.task.gen import RollingGen
from qlib.workflow.task.manage import TaskManager
from qlib.workflow.online.manager import OnlineManager

data_handler_config = {
Expand Down Expand Up @@ -84,77 +83,98 @@ def __init__(
task_url="mongodb://10.0.0.4:27017/",
task_db_name="rolling_db",
rolling_step=550,
tasks=[task_xgboost_config, task_lgb_config],
tasks=[task_xgboost_config],
add_tasks=[task_lgb_config],
):
mongo_conf = {
"task_url": task_url, # your MongoDB url
"task_db_name": task_db_name, # database name
}
qlib.init(provider_uri=provider_uri, region=region, mongo=mongo_conf)
self.tasks = tasks
self.add_tasks = add_tasks
self.rolling_step = rolling_step
strategy = []
strategies = []
for task in tasks:
name_id = task["model"]["class"] # NOTE: Assumption: The model class can specify only one strategy
strategy.append(
RollingAverageStrategy(
strategies.append(
RollingStrategy(
name_id,
task,
RollingGen(step=rolling_step, rtype=RollingGen.ROLL_SD),
)
)

self.rolling_online_manager = OnlineManager(strategy)
self.collector = self.rolling_online_manager.get_collector()
self.rolling_online_manager = OnlineManager(strategies)

_ROLLING_MANAGER_PATH = (
".RollingOnlineExample" # the OnlineManager will dump to this file, for it can be loaded when calling routine.
)

# Reset all things to the first status, be careful to save important data
def reset(self):
for task in self.tasks:
for task in self.tasks + self.add_tasks:
name_id = task["model"]["class"]
TaskManager(name_id).remove()
exp = R.get_exp(experiment_name=name_id)
for rid in exp.list_recorders():
exp.delete_recorder(rid)

if os.path.exists(self._ROLLING_MANAGER_PATH):
os.remove(self._ROLLING_MANAGER_PATH)
if os.path.exists(self._ROLLING_MANAGER_PATH):
os.remove(self._ROLLING_MANAGER_PATH)

def first_run(self):
print("========== reset ==========")
self.reset()
print("========== first_run ==========")
self.rolling_online_manager.first_train()
print("========== collect results ==========")
print(self.rolling_online_manager.get_collector()())
print("========== dump ==========")
self.rolling_online_manager.to_pickle(self._ROLLING_MANAGER_PATH)
print("========== collect results ==========")
print(self.collector())

def routine(self):
print("========== load ==========")
with Path(self._ROLLING_MANAGER_PATH).open("rb") as f:
self.rolling_online_manager = pickle.load(f)
self.rolling_online_manager = OnlineManager.load(self._ROLLING_MANAGER_PATH)
print("========== routine ==========")
self.rolling_online_manager.routine()
print("========== collect results ==========")
print(self.collector())
print(self.rolling_online_manager.get_collector()())
print("========== signals ==========")
print(self.rolling_online_manager.get_signals())
print("========== dump ==========")
self.rolling_online_manager.to_pickle(self._ROLLING_MANAGER_PATH)

def add_strategy(self):
print("========== load ==========")
self.rolling_online_manager = OnlineManager.load(self._ROLLING_MANAGER_PATH)
print("========== add strategy ==========")
strategies = []
for task in self.add_tasks:
name_id = task["model"]["class"] # NOTE: Assumption: The model class can specify only one strategy
strategies.append(
RollingStrategy(
name_id,
task,
RollingGen(step=self.rolling_step, rtype=RollingGen.ROLL_SD),
)
)
self.rolling_online_manager.add_strategy(strategies=strategies)
print("========== dump ==========")
self.rolling_online_manager.to_pickle(self._ROLLING_MANAGER_PATH)

def main(self):
self.first_run()
self.routine()
self.add_strategy()
self.routine()


if __name__ == "__main__":
####### to train the first version's models, use the command below
# python rolling_online_management.py first_run

####### to update the models and predictions after the trading time, use the command below
# python rolling_online_management.py after_day
# python rolling_online_management.py routine

####### to define your own parameters, use `--`
# python rolling_online_management.py first_run --exp_name='your_exp_name' --rolling_step=40
Expand Down
6 changes: 3 additions & 3 deletions examples/online_srv/update_online_pred.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
# Licensed under the MIT License.

"""
This example show how OnlineTool works when we need update prediction.
This example shows how OnlineTool works when we need update prediction.
There are two parts including first_train and update_online_pred.
Firstly, we will finish the training and set the trained model to `online` model.
Next, we will finish updating online prediction.
Firstly, we will finish the training and set the trained models to the `online` models.
Next, we will finish updating online predictions.
"""
import fire
import qlib
Expand Down
5 changes: 4 additions & 1 deletion qlib/data/dataset/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,10 @@ def _prepare_seg(self, slc: slice, **kwargs):
----------
slc : slice
"""
return self.handler.fetch(slc, **kwargs, **self.fetch_kwargs)
if hasattr(self, "fetch_kwargs"):
return self.handler.fetch(slc, **kwargs, **self.fetch_kwargs)
else:
return self.handler.fetch(slc, **kwargs)

def prepare(
self,
Expand Down
Loading