Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

init version of online serving and rolling #290

Merged
merged 77 commits into from
May 17, 2021
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
04b916c
safe yaml loader
you-n-g Feb 16, 2021
83237ba
yml afe load
you-n-g Feb 17, 2021
1e5cf1c
init version of online serving and rolling
you-n-g Feb 26, 2021
24024d5
qlib auto init basedon project & black format
you-n-g Feb 27, 2021
c4733f6
Merge pull request #1 from you-n-g/online_srv
lzh222333 Mar 2, 2021
b84156f
Consider more situations about task_config.
lzh222333 Mar 3, 2021
05cf0e1
add task_generator method and update some hint
lzh222333 Mar 3, 2021
fd2c1ba
Update some hint
lzh222333 Mar 3, 2021
2882929
Add an example about workflow using RollingGen.
lzh222333 Mar 3, 2021
a244f87
modified the comments
lzh222333 Mar 8, 2021
def132e
modified format and added TaskCollector
lzh222333 Mar 8, 2021
83dbdfb
finished document and example
lzh222333 Mar 9, 2021
e2f5827
update task manager
lzh222333 Mar 10, 2021
2ca2071
format code
lzh222333 Mar 10, 2021
48f0fc1
first version of online serving
lzh222333 Mar 11, 2021
0df88c0
bug fixed and update collect.py
lzh222333 Mar 11, 2021
44a7dc0
update docs and fix duplicated pred bug
you-n-g Mar 12, 2021
5de7870
Merge branch 'online_srv' of github.com:you-n-g/qlib into online_srv
you-n-g Mar 12, 2021
6d8aa21
the second version of online serving
lzh222333 Mar 12, 2021
9d84d38
format code and add example
lzh222333 Mar 12, 2021
e4e8a4a
fix task name & add cur_path
you-n-g Mar 12, 2021
8362780
fix import bug
you-n-g Mar 14, 2021
646d899
update docstring and document
lzh222333 Mar 15, 2021
0bc49da
add task management to index.rst
lzh222333 Mar 15, 2021
e3730b3
more clearly structure
lzh222333 Mar 16, 2021
5953365
finished update_online_pred demo
lzh222333 Mar 16, 2021
d33041d
format example
lzh222333 Mar 16, 2021
8abdd63
online_serving V3
lzh222333 Mar 18, 2021
84d5318
Merge branch 'online_srv_wd' into online_srv
you-n-g Mar 19, 2021
d66d4ec
Merge branch 'main' into online_srv
lzh222333 Mar 23, 2021
46cd576
Online Serving V4
lzh222333 Mar 26, 2021
9bf819e
Merge branch 'online_srv' of https://github.com/you-n-g/qlib into onl…
lzh222333 Mar 26, 2021
ee45a78
Merge branch 'main' into online_srv
lzh222333 Mar 26, 2021
1f2d2c9
online debug
lzh222333 Mar 30, 2021
eae94d1
Merge remote-tracking branch 'microsoft/qlib/main' into online_srv
lzh222333 Mar 30, 2021
544365f
ensemble & get_exp & dataset_pickle
lzh222333 Mar 31, 2021
3724273
Merge remote-tracking branch 'microsoft/qlib/main' into online_srv
lzh222333 Mar 31, 2021
edcd7b1
bug fixed & code format
lzh222333 Mar 31, 2021
bd7a1c1
trainer & group & collect & ensemble
lzh222333 Apr 2, 2021
431a9c9
online serving v5
lzh222333 Apr 2, 2021
cb42e99
bug fixed & examples fire
lzh222333 Apr 7, 2021
1dbb561
Fix some API(for lb nn)
you-n-g Apr 7, 2021
7160579
Merge branch 'online_srv_wd' into online_srv
you-n-g Apr 7, 2021
c20eb5c
format code
lzh222333 Apr 8, 2021
18bf4b5
parameter adjustment
you-n-g Apr 8, 2021
a366c11
Update features for hyb nn
you-n-g Apr 9, 2021
cca43cf
Refactor update & modification when running NN
you-n-g Apr 11, 2021
b15e5e3
Fix the multi-processing bug
you-n-g Apr 12, 2021
5095b2a
simulator & examples
lzh222333 Apr 13, 2021
cec318f
online serving V7
lzh222333 Apr 16, 2021
de0a0c0
bug fixed
lzh222333 Apr 22, 2021
319396c
online serving V8
lzh222333 Apr 25, 2021
0058f7d
Online Serving V8
lzh222333 Apr 26, 2021
42f5100
update collector
lzh222333 Apr 27, 2021
36ab078
filter
Apr 28, 2021
45c6dfc
filter
Apr 28, 2021
fa4511c
filter
Apr 28, 2021
40cf83e
online serving V9 middle status
lzh222333 Apr 28, 2021
6f66934
Merge branch 'online_srv' of https://github.com/you-n-g/qlib into onl…
lzh222333 Apr 28, 2021
67c5740
OnlineServing V9
lzh222333 Apr 29, 2021
2b7ffa1
Merge remote-tracking branch 'microsoft/main' into online_srv
lzh222333 Apr 29, 2021
1c99fb3
Merge remote-tracking branch 'microsoft/main' into online_srv
lzh222333 May 6, 2021
84c56f1
docs and bug fixed
lzh222333 May 6, 2021
846c64f
fix param
binlins May 6, 2021
9dfd001
online serving v10
lzh222333 May 7, 2021
bec65dd
add document and reindex
binlins May 7, 2021
08edb92
add flt_data doc
binlins May 7, 2021
060a32e
Merge branch 'online_srv' into online_srv_blin
you-n-g May 7, 2021
1c605e5
Merge pull request #14 from you-n-g/online_srv_blin
you-n-g May 7, 2021
4c23261
Merge branch 'online_srv' of https://github.com/you-n-g/qlib into onl…
lzh222333 May 9, 2021
f5ded06
Merge remote-tracking branch 'microsoft/main' into online_srv
lzh222333 May 9, 2021
370b6aa
logger & doc
lzh222333 May 9, 2021
d71a666
Online serving V11
lzh222333 May 13, 2021
ebd01e0
Online Serving V11
lzh222333 May 14, 2021
aef3f18
format code
lzh222333 May 14, 2021
a986379
bug fixed
lzh222333 May 14, 2021
8c3a08b
Finally!
lzh222333 May 17, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions docs/advanced/task_managment.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
.. _task_managment:

=================================
Task Management
=================================
.. currentmodule:: qlib


Introduction
=============

The `Workflow <../component/introduction.html>`_ part introduce how to run research workflow in a loosely-coupled way. But it can only execute one ``task`` when you use ``qrun``. To automatically generate and execute different tasks, Task Management module provide a whole process including `Task Generating`_, `Task Storing`_, `Task Running`_ and `Task Collecting`_.
With this module, users can run their ``task`` automatically at different periods, in different losses or even by different models.

An example of the entire process is shown `here <>`_.

Task Generating
===============
A ``task`` consists of `Model`, `Dataset`, `Record` or anything added by users.
The specific task template can be viewed in
`Task Section <../component/workflow.html#task-section>`_.
Even though the task template is fixed, Users can use ``TaskGen`` to generate different ``task`` by task template.

Here is the base class of TaskGen:

.. autoclass:: qlib.workflow.task.gen.TaskGen
:members:

``Qlib`` provider a class `RollingGen<https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/gen.py>`_ to generate a list of ``task`` of dataset in different date segments.
This allows users to verify the effect of data from different periods on the model in one experiment.

Task Storing
===============
In order to achieve higher efficiency and the possibility of cluster operation, ``Task Manager`` will store all tasks in `MongoDB <https://www.mongodb.com/>`_.
Users **MUST** finished the configuration of `MongoDB <https://www.mongodb.com/>`_ when using this module.

Users need to provide the url and database of ``task`` storing like this.

.. code-block:: python

from qlib.config import C
C["mongo"] = {
"task_url" : "mongodb://localhost:27017/", # maybe you need to change it to your url
"task_db_name" : "rolling_db" # you can custom database name
}

The CRUD methods of ``task`` can be found in TaskManager. More methods can be seen in the `Github<https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/manage.py>`_.

.. autoclass:: qlib.workflow.task.manage.TaskManager
:members:

Task Running
===============
After generating and storing those ``task``, it's time to run the ``task`` in the *WAITING* status.
``qlib`` provide a method to run those ``task`` in task pool, however users can also customize how tasks are executed.
An easy way to get the ``task_func`` is using ``qlib.model.trainer.task_train`` directly.
It will run the whole workflow defined by ``task``, which includes *Model*, *Dataset*, *Record*.

.. autofunction:: qlib.workflow.task.manage.run_task

Task Collecting
===============
To see the results of ``task`` after running, ``Qlib`` provide a task collector to collect the tasks by filter condition (optional).
The collector will return a dict of filtered key (users defined by task config) and value (predict scores from ``pred.pkl``).

.. autoclass:: qlib.workflow.task.collect.TaskCollector
:members:
445 changes: 445 additions & 0 deletions examples/taskmanager/task_manager_rolling.ipynb

Large diffs are not rendered by default.

108 changes: 108 additions & 0 deletions examples/taskmanager/task_manager_rolling.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
import qlib
from qlib.config import REG_CN
from qlib.workflow.task.gen import RollingGen, task_generator
from qlib.workflow.task.manage import TaskManager
from qlib.config import C

data_handler_config = {
"start_time": "2008-01-01",
"end_time": "2020-08-01",
"fit_start_time": "2008-01-01",
"fit_end_time": "2014-12-31",
"instruments": 'csi100',
}

dataset_config = {
"class": "DatasetH",
"module_path": "qlib.data.dataset",
"kwargs": {
"handler": {
"class": "Alpha158",
"module_path": "qlib.contrib.data.handler",
"kwargs": data_handler_config,
},
"segments": {
"train": ("2008-01-01", "2014-12-31"),
"valid": ("2015-01-01", "2016-12-31"),
"test": ("2017-01-01", "2020-08-01"),
},
},
}

record_config = [
{
"class": "SignalRecord",
"module_path": "qlib.workflow.record_temp",
},
{
"class": "SigAnaRecord",
"module_path": "qlib.workflow.record_temp",
}
]

# use lgb
task_lgb_config = {
"model": {
"class": "LGBModel",
"module_path": "qlib.contrib.model.gbdt",
},
"dataset": dataset_config,
"record": record_config,
}

# use xgboost
task_xgboost_config = {
"model": {
"class": "XGBModel",
"module_path": "qlib.contrib.model.xgboost",
},
"dataset": dataset_config,
"record": record_config,
}

provider_uri = "~/.qlib/qlib_data/cn_data" # target_dir
qlib.init(provider_uri=provider_uri, region=REG_CN)

C["mongo"] = {
"task_url" : "mongodb://localhost:27017/", # maybe you need to change it to your url
"task_db_name" : "rolling_db"
}

exp_name = 'rolling_exp' # experiment name, will be used as the experiment in MLflow
task_pool = 'rolling_task' # task pool name, will be used as the document in MongoDB

tasks = task_generator(
task_xgboost_config, # default task name
RollingGen(step=550,rtype=RollingGen.ROLL_SD), # generate different date segment
task_lgb=task_lgb_config # use "task_lgb" as the task name
)

# Uncomment next two lines to see the generated tasks
# from pprint import pprint
# pprint(tasks)

tm = TaskManager(task_pool=task_pool)
tm.create_task(tasks) # all tasks will be saved to MongoDB

from qlib.workflow.task.manage import run_task
from qlib.workflow.task.collect import RollingCollector
from qlib.model.trainer import task_train

run_task(task_train, task_pool, experiment_name=exp_name) # all tasks will be trained using "task_train" method

def get_task_key(task_config):
task_key = task_config["task_key"]
rolling_end_timestamp = task_config["dataset"]["kwargs"]["segments"]["test"][1]
#rolling_end_datatime = rolling_end_timestamp.to_pydatetime()
return task_key, rolling_end_timestamp.strftime('%Y-%m-%d')

def my_filter(task_config):
# only choose the results of "task_lgb" and test in 2019 from all tasks
task_key, rolling_end = get_task_key(task_config)
if task_key=="task_lgb" and rolling_end.startswith('2019'):
return True
return False

collector = RollingCollector(get_task_key, my_filter)
pred_rolling = collector(exp_name) # name tasks by "get_task_key" and filter tasks by "my_filter"
print(pred_rolling)
75 changes: 73 additions & 2 deletions qlib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,21 @@


__version__ = "0.6.3.99"
__version__bak = __version__ # This version is backup for QlibConfig.reset_qlib_version


import os
import yaml
import logging
import platform
import subprocess
from pathlib import Path
from .log import get_module_logger


# init qlib
def init(default_conf="client", **kwargs):
from .config import C
from .log import get_module_logger
from .data.cache import H

H.clear()
Expand Down Expand Up @@ -48,7 +50,6 @@ def init(default_conf="client", **kwargs):


def _mount_nfs_uri(C):
from .log import get_module_logger

LOG = get_module_logger("mount nfs", level=logging.INFO)

Expand Down Expand Up @@ -151,3 +152,73 @@ def init_from_yaml_conf(conf_path, **kwargs):
config.update(kwargs)
default_conf = config.pop("default_conf", "client")
init(default_conf, **config)


def get_project_path(config_name="config.yaml") -> Path:
"""
If users are building a project follow the following pattern.
- Qlib is a sub folder in project path
- There is a file named `config.yaml` in qlib.

For example:
If your project file system stucuture follows such a pattern

<project_path>/
- config.yaml
- ...some folders...
- qlib/

This folder will return <project_path>

NOTE: link is not supported here.


This method is often used when
- user want to use a relative config path instead of hard-coding qlib config path in code

Raises
------
FileNotFoundError:
If project path is not found
"""
cur_path = Path(__file__).absolute().resolve()
while True:
if (cur_path / config_name).exists():
return cur_path
if cur_path == cur_path.parent:
raise FileNotFoundError("We can't find the project path")
cur_path = cur_path.parent


def auto_init(**kwargs):
"""
This function will init qlib automatically with following priority
- Find the project configuration and init qlib
- The parsing process will be affected by the `conf_type` of the configuration file
- Init qlib with default config
"""

try:
pp = get_project_path()
except FileNotFoundError:
init(**kwargs)
else:

conf_pp = pp / "config.yaml"
with conf_pp.open() as f:
conf = yaml.safe_load(f)

conf_type = conf.get("conf_type", "origin")
if conf_type == "origin":
# The type of config is just like original qlib config
init_from_yaml_conf(conf_pp, **kwargs)
elif conf_type == "ref":
# This config type will be more convenient in following scenario
# - There is a shared configure file and you don't want to edit it inplace.
# - The shared configure may be updated later and you don't want to copy it.
# - You have some customized config.
qlib_conf_path = conf["qlib_cfg"]
qlib_conf_update = conf.get("qlib_cfg_update")
init_from_yaml_conf(qlib_conf_path, **qlib_conf_update, **kwargs)
logger = get_module_logger("Initialization")
logger.info(f"Auto load project config: {conf_pp}")
17 changes: 17 additions & 0 deletions qlib/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ def __getattr__(self, attr):

raise AttributeError(f"No such {attr} in self._config")

def get(self, key, default=None):
return self.__dict__["_config"].get(key, default)

def __setitem__(self, key, value):
self.__dict__["_config"][key] = value

Expand Down Expand Up @@ -310,8 +313,22 @@ def register(self):
# clean up experiment when python program ends
experiment_exit_handler()

# Supporting user reset qlib version (useful when user want to connect to qlib server with old version)
self.reset_qlib_version()

self._registered = True

def reset_qlib_version(self):
import qlib

reset_version = self.get("qlib_reset_version", None)
if reset_version is not None:
qlib.__version__ = reset_version
else:
qlib.__version__ = getattr(qlib, "__version__bak")
# Due to a bug? that converting __version__ to _QlibConfig__version__bak
# Using __version__bak instead of __version__

@property
def registered(self):
return self._registered
Expand Down
Empty file added qlib/model/ens/__init__.py
Empty file.
12 changes: 10 additions & 2 deletions qlib/model/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,16 @@
from qlib.workflow.record_temp import SignalRecord


def task_train(task_config: dict, experiment_name):
def task_train(task_config: dict, experiment_name: str):
"""
task based training

Parameters
----------
task_config : dict
A dict describes a task setting.
experiment_name: str
The name of experiment
"""

# model initiaiton
Expand All @@ -27,16 +29,22 @@ def task_train(task_config: dict, experiment_name):
model.fit(dataset)
recorder = R.get_recorder()
R.save_objects(**{"params.pkl": model})
R.save_objects(param=task_config) # keep the original format and datatype

# generate records: prediction, backtest, and analysis
for record in task_config["record"]:
records = task_config.get("record", [])
if isinstance(records, dict): # prevent only one dict
records = [records]
for record in records:
if record["class"] == SignalRecord.__name__:
srconf = {"model": model, "dataset": dataset, "recorder": recorder}
record.setdefault("kwargs", {})
record["kwargs"].update(srconf)
sr = init_instance_by_config(record)
sr.generate()
else:
rconf = {"recorder": recorder}
record.setdefault("kwargs", {})
record["kwargs"].update(rconf)
ar = init_instance_by_config(record)
ar.generate()
13 changes: 13 additions & 0 deletions qlib/workflow/task/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
"""
Task related workflow is implemented in this folder

A typical task workflow

| Step | Description |
|-----------------------+------------------------------------------------|
| TaskGen | Generating tasks. |
| TaskManager(optional) | Manage generated tasks |
| run task | retrive tasks from TaskManager and run tasks. |
"""
Loading