Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export env to python #7792

Merged
merged 71 commits into from
Apr 2, 2022
Merged

Export env to python #7792

merged 71 commits into from
Apr 2, 2022

Conversation

lixinqi
Copy link
Contributor

@lixinqi lixinqi commented Mar 14, 2022

oneflow的env生命周期应该是一个被oneflow python模块持有的对象,这样它的生命周期会持续到python解释器结束。

@@ -199,6 +202,15 @@ Maybe<void> VirtualMachine::Receive(vm::InstructionMsgList* instr_list) {
// `ComputeInFuseMode` will be replaced by `Compute` soon.
instr_msg->mut_instr_type_id()->instruction_type().ComputeInFuseMode(instr_msg);
}
} else if (IsShuttingDown()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在shutting down阶段直接在main线程里处理指令。

@@ -210,7 +210,7 @@ def is_deprecated(func_or_class):

if not env_util.HasAllMultiClientEnvVars():
env_util.SetDefaultMultiClientEnvVars()
env_util.api_env_init()
_oneflow_global_unique_env_ = env_util.create_env()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最核心改动。

@@ -13,17 +13,6 @@
See the License for the specific language governing permissions and
limitations under the License.
"""
from oneflow.framework.env_util import api_all_device_placement as all_device_placement
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些接口都是过时的

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all_device_placement 这个接口不是过时的,不能删除

class TestCallWhenShuttingDown:
def __init__(self):
tensor = oneflow.ones((2, 2))
print(tensor)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果把这一行注释,行为就会和pytorch不一致,pytorch会执行成功,oneflow会报错。
但是这一问题和本次pr肯定没关系,我们新开issue讨论这一问题。

@lixinqi
Copy link
Contributor Author

lixinqi commented Mar 14, 2022

寻找在shutting down环节执行torch代码的规律

首先考察pytorch。

# script0
import torch

device_type = "cpu"

class Foo:
    def __init__(self):
        pass

    def __del__(self):
        tensor = torch.ones((8, 8), device=torch.device(device_type))
        print(tensor)

foo = Foo()

上述示例代码能正常工作,输入如下:

tensor([[1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.]])

如果把device_type改成gpu,也就是示例代码如:

# script1
import torch

device_type = "cuda"

class Foo:
    def __init__(self):
        pass

    def __del__(self):
        tensor = torch.ones((8, 8), device=torch.device(device_type))
        print(tensor)

foo = Foo()

这就不能工作,输出如下:

Exception ignored in: <bound method Foo.__del__ of <__main__.Foo object at 0x7f49326f8048>>
Traceback (most recent call last):
  File "a.py", line 11, in __del__
ImportError: sys.meta_path is None, Python is likely shutting down

但如果我们在外层作用域先执行一次torch.ones,示例代码如:

# script2
import torch

device_type = "cuda"
torch.ones((32, 32), device=torch.device(device_type))

class Foo:
    def __init__(self):
        pass

    def __del__(self):
        tensor = torch.ones((8, 8), device=torch.device(device_type))
        print(tensor)

foo = Foo()

这又是能正常工作的,同样会正常输出:

import torch

device_type = "cuda"
torch.ones((32, 32), device=torch.device(device_type))

class Foo:
    def __init__(self):
        pass

    def __del__(self):
        tensor = torch.ones((8, 8), device=torch.device(device_type))
        print(tensor)

foo = Foo()

猜测原因

猜测背后的规则应该非常简单:在shutting down的环节不能再实质import新的module,我们可以这样解释上述现象:

  1. script0能工作是因为cpu op除了需要torch module外,可能不需要任何module。
  2. script1不能工作是因为cuda op除了需要torch module外,可能需要用到一个专门处理cuda 的module。
  3. script2能工作是因为cuda op在顶层作用域执行过,相关的cuda python module已经Import并缓存了起来。

最终的原因只能查看python的文档或者代码。

@lixinqi
Copy link
Contributor Author

lixinqi commented Mar 15, 2022

关于py::gil_scoped_acquired在非主线程调用的问题

我单独测试表明,py::gil_scoped_acquired可以在python解释器退出时,可以安全地在非主线程里调用。

// example.cpp
#include <pybind11/pybind11.h>
#include <pybind11/functional.h>
#include <thread>
#include <iostream>
#include <condition_variable>
#include <mutex>
#include <chrono>

void TestGILInNonMainThread() {
  std::mutex mutex;
  std::condition_variable cond;
  std::thread thread([&]{
    std::unique_lock<std::mutex> lock(mutex);
    cond.wait(lock, []{ return true; });
    std::cerr << "before_gil_scoped_acquire" << " ... ";
    std::this_thread::sleep_for(std::chrono::milliseconds(2000));
    pybind11::gil_scoped_acquire lock_gil{};
    std::cerr << "after_gil_scoped_acquire" << std::endl;
  });
  cond.notify_one();
  pybind11::gil_scoped_release unlock_gil{};
  thread.join();
}

PYBIND11_MODULE(example, m) {
    m.def("TestGILInNonMainThread", &TestGILInNonMainThread);
}
g++ -O3 -Wall -shared -std=c++11 -fPIC $(python3 -m pybind11 --includes) example.cpp -o example$(python3-config --extension-suffix)
# a.py
import example

class Foo:
    def __init__(self):
        pass

    def __del__(self):
        example.TestGILInNonMainThread()

foo = Foo()

最后输出表明py::gil_scoped_acquired正常工作。

$ python3 a.py
before_gil_scoped_acquire ... after_gil_scoped_acquire

@lixinqi
Copy link
Contributor Author

lixinqi commented Mar 15, 2022

已经移除了virtual_machine.cpp对shutting down的依赖,回滚到master的逻辑。

env_util.api_env_init()
_unittest_env_initilized = True

TestCase = unittest.TestCase
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前我们的TestCase相较于基类unittest.TestCase不需要多余的操作,所以直接导出。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@strint @caishenghang

@lixinqi
Copy link
Contributor Author

lixinqi commented Mar 16, 2022

彻底调查清楚py::gil_scoped_acquire 在python finalization阶段中被non-main调用的问题

pybind11 的这个链接 pybind/pybind11#3274 已经完全说明清楚了问题,而且与我们的观察完全一致。

剩下的问题是为什么本pr上述 #7792 (comment) 又莫名其妙的正常工作,原因是上述代码使用的python版本是3.6,出问题的python版本是3.8。如果上述代码用如下编译方式编译出python包:

g++ -O3 -Wall -shared -std=c++11 -fPIC $(python3.8 -m pybind11 --includes) example.cpp -o example$(python3.8-config --extension-suffix)

再执行python3 a.py就会复现这一python自身的BUG。

$ python3.8 a.py
before_gil_scoped_acquire ...

@strint
Copy link
Contributor

strint commented Mar 16, 2022

彻底调查清楚py::gil_scoped_acquire 在python finalization阶段中被non-main调用的问题

pybind11 的这个链接 pybind/pybind11#3274 已经完全说明清楚了问题,而且与我们的观察完全一致。

这个bug来源于python本身, https://bugs.python.org/issue42969 , 关联的PR:python/cpython#28525 还没合并,看起来他们还没就处理办法达成一致。也就是即使升级到python 3.11 也不能解决问题。

而我们要兼容Python 3.6, 3.7, 3.8, 3.9, 3.10,所以使用atexit来避免Python的这个bug是长期的。

@lixinqi lixinqi force-pushed the export_env_to_python branch from c072305 to 86296cb Compare March 16, 2022 14:21
@lixinqi lixinqi force-pushed the export_env_to_python branch from 1873471 to 454f5e7 Compare March 16, 2022 16:02
Comment on lines +29 to +36
if (is_normal_exit) {
JUST(vm::ClusterSync());
auto* vm = JUST(GlobalMaybe<VirtualMachine>());
JUST(vm->CloseVMThreads());
}
JUST(env->init_is_normal_exit(is_normal_exit));
SetShuttingDown(true);
return Maybe<void>::Ok();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

旧版的逻辑写在python层。如果遇到系统异常退出,则完全不执行DeleteEnv。为了对齐此逻辑,我们让EnvGlobalObjectsScope的析构在!is_normal_exit的时候不执行那一系列的Global::Delete();

@@ -229,6 +229,7 @@ Maybe<void> EnvGlobalObjectsScope::Init(const EnvProto& env_proto) {
}

EnvGlobalObjectsScope::~EnvGlobalObjectsScope() {
if (is_normal_exit_.has_value() && !CHECK_JUST(is_normal_exit_)) { return; }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

也许应该命名为is_abnormal_exit

Comment on lines +79 to +81
std::mutex pending_instruction_mutex_;
PendingInstructionMutexedList pending_instruction_list_;
Notifier notifier_;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

完全去掉channel,使用list + notifier代替。原因是我们在finalization阶段会结束worker线程,让指令在main线程运行,channel和线程绑定得过深了,必须关闭channel才能让worker线程退出,而一旦关闭了channel,其他线程就没法再通过channel发送指令。而list + notifier相当于拆解了channel的功能,关闭notifier才能让线程退出,之后list可以继续使用。

Comment on lines +108 to +110
while (thread_ctx->mut_notifier()->WaitAndClearNotifiedCnt() == kNotifierStatusSuccess) {
while (thread_ctx->TryReceiveAndRun()) {}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处的逻辑非常类似scheduler线程和callback线程的处理。

@@ -115,7 +118,7 @@ VirtualMachine::VirtualMachine(const Resource& resource, int64_t this_machine_id
// In order to notify threads in VirtualMachineEngine, a notify callback lambda should be take as
// an argument for VirtualMachineEngine's constructor.
vm_ = intrusive::make_shared<vm::VirtualMachineEngine>(
vm::MakeVmDesc(resource, this_machine_id).Get(), [this]() { callback_notifier_.Notify(); });
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

callback_notifier_被ScheduleCtx代替了。

Comment on lines +160 to +166
Maybe<void> VirtualMachine::CloseVMThreads() {
CHECK_OR_RETURN(!vm_threads_closed_);
ControlSync();
pending_notifier_.Close();
schedule_thread_.join();
CHECK(!vm_);
vm_threads_closed_ = true;
return Maybe<void>::Ok();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

关闭VMThread线程,从此之后vm将以单线程的方式执行。
这部分的功能从VirtualMachine的析构里单独抽取出来,供python的atexit调用。

@@ -199,6 +212,8 @@ Maybe<void> VirtualMachine::Receive(vm::InstructionMsgList* instr_list) {
// `ComputeInFuseMode` will be replaced by `Compute` soon.
instr_msg->mut_instr_type_id()->instruction_type().ComputeInFuseMode(instr_msg);
}
} else if (unlikely(vm_threads_closed_)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vm_threads_closed_在CloseVMThreads里被置为true

Comment on lines +248 to +251
void OnGarbageMsgPending() const override { vm_->Callback(); }
void OnWorkerLoadPending(vm::ThreadCtx* thread_ctx) const override {
while (thread_ctx->TryReceiveAndRun() > 0) {}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一旦接到任务,都是原地执行。

if hook.is_normal_exit():
oneflow._oneflow_internal.DestroyEnv()
oneflow._oneflow_internal.SetShuttingDown()
_oneflow_global_unique_env_.SwitchToShuttingDownPhase(hook.is_normal_exit())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上边删掉的逻辑都放置在SwitchToShuttingDownPhase函数里。

@chengtbf chengtbf requested review from oneflow-ci-bot and removed request for oneflow-ci-bot March 31, 2022 02:17
@github-actions
Copy link
Contributor

Static analysis with clang failed. PR label automerge has been removed

@github-actions
Copy link
Contributor

github-actions bot commented Apr 1, 2022

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.3ms (= 12833.6ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 140.6ms (= 14059.1ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 140.6ms / 128.3ms)

✔️ OneFlow resnet50 time: 77.9ms (= 7794.3ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 86.2ms (= 8623.0ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.11 (= 86.2ms / 77.9ms)

OneFlow resnet50 time: 53.9ms (= 10770.8ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 57.1ms (= 11412.6ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.06 (= 57.1ms / 53.9ms)

OneFlow resnet50 time: 43.6ms (= 8729.5ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 50.7ms (= 10148.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.16 (= 50.7ms / 43.6ms)

OneFlow resnet50 time: 39.2ms (= 7839.0ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.4ms (= 7687.0ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 0.98 (= 38.4ms / 39.2ms)

OneFlow swin dataloader time: 0.245s (= 49.085s / 200, num_workers=1)
PyTorch swin dataloader time: 0.253s (= 50.666s / 200, num_workers=1)
✔️ Relative speed: 1.032 (= 0.253s / 0.245s)

OneFlow swin dataloader time: 0.067s (= 13.317s / 200, num_workers=4)
PyTorch swin dataloader time: 0.070s (= 14.092s / 200, num_workers=4)
✔️ Relative speed: 1.058 (= 0.070s / 0.067s)

OneFlow swin dataloader time: 0.036s (= 7.150s / 200, num_workers=8)
PyTorch swin dataloader time: 0.037s (= 7.474s / 200, num_workers=8)
✔️ Relative speed: 1.045 (= 0.037s / 0.036s)

✔️ OneFlow resnet50 time: 135.7ms (= 13574.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 157.3ms (= 15733.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 157.3ms / 135.7ms)

OneFlow resnet50 time: 89.5ms (= 8945.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.9ms (= 10285.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 102.9ms / 89.5ms)

OneFlow resnet50 time: 62.3ms (= 12462.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 76.8ms (= 15368.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.23 (= 76.8ms / 62.3ms)

OneFlow resnet50 time: 53.9ms (= 10771.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.9ms (= 13183.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.22 (= 65.9ms / 53.9ms)

OneFlow resnet50 time: 49.0ms (= 9807.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 61.1ms (= 12213.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.25 (= 61.1ms / 49.0ms)

@strint strint requested review from oneflow-ci-bot and removed request for oneflow-ci-bot April 1, 2022 13:33
@github-actions
Copy link
Contributor

github-actions bot commented Apr 1, 2022

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.3ms (= 12834.0ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.6ms (= 14159.5ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 141.6ms / 128.3ms)

✔️ OneFlow resnet50 time: 77.5ms (= 7753.5ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 84.2ms (= 8417.0ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.09 (= 84.2ms / 77.5ms)

OneFlow resnet50 time: 53.5ms (= 10690.0ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 62.8ms (= 12557.9ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.17 (= 62.8ms / 53.5ms)

OneFlow resnet50 time: 44.5ms (= 8890.2ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 47.4ms (= 9472.0ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.07 (= 47.4ms / 44.5ms)

OneFlow resnet50 time: 40.4ms (= 8075.8ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.3ms (= 7658.7ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 0.95 (= 38.3ms / 40.4ms)

OneFlow swin dataloader time: 0.248s (= 49.546s / 200, num_workers=1)
PyTorch swin dataloader time: 0.249s (= 49.739s / 200, num_workers=1)
✔️ Relative speed: 1.004 (= 0.249s / 0.248s)

OneFlow swin dataloader time: 0.066s (= 13.169s / 200, num_workers=4)
PyTorch swin dataloader time: 0.068s (= 13.566s / 200, num_workers=4)
✔️ Relative speed: 1.030 (= 0.068s / 0.066s)

OneFlow swin dataloader time: 0.036s (= 7.226s / 200, num_workers=8)
PyTorch swin dataloader time: 0.036s (= 7.297s / 200, num_workers=8)
✔️ Relative speed: 1.010 (= 0.036s / 0.036s)

✔️ OneFlow resnet50 time: 135.7ms (= 13565.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 156.0ms (= 15603.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 156.0ms / 135.7ms)

OneFlow resnet50 time: 89.0ms (= 8902.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 105.0ms (= 10501.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 105.0ms / 89.0ms)

OneFlow resnet50 time: 61.3ms (= 12255.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 76.7ms (= 15336.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.25 (= 76.7ms / 61.3ms)

OneFlow resnet50 time: 52.6ms (= 10520.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 66.5ms (= 13296.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 66.5ms / 52.6ms)

OneFlow resnet50 time: 51.0ms (= 10200.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.6ms (= 14315.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.40 (= 71.6ms / 51.0ms)

@github-actions
Copy link
Contributor

github-actions bot commented Apr 1, 2022

CI failed when running job: cuda-speed-test. PR label automerge has been removed

@github-actions github-actions bot removed the automerge label Apr 1, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Apr 2, 2022

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.7ms (= 12871.4ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 140.5ms (= 14045.9ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.09 (= 140.5ms / 128.7ms)

✔️ OneFlow resnet50 time: 78.5ms (= 7848.3ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 88.5ms (= 8846.8ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.13 (= 88.5ms / 78.5ms)

OneFlow resnet50 time: 53.6ms (= 10718.6ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 59.0ms (= 11806.8ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.10 (= 59.0ms / 53.6ms)

OneFlow resnet50 time: 44.7ms (= 8940.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 52.3ms (= 10450.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.17 (= 52.3ms / 44.7ms)

OneFlow resnet50 time: 40.5ms (= 8101.2ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 43.1ms (= 8615.1ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.06 (= 43.1ms / 40.5ms)

OneFlow swin dataloader time: 0.252s (= 50.364s / 200, num_workers=1)
PyTorch swin dataloader time: 0.251s (= 50.187s / 200, num_workers=1)
✔️ Relative speed: 0.996 (= 0.251s / 0.252s)

OneFlow swin dataloader time: 0.069s (= 13.853s / 200, num_workers=4)
PyTorch swin dataloader time: 0.069s (= 13.782s / 200, num_workers=4)
✔️ Relative speed: 0.995 (= 0.069s / 0.069s)

OneFlow swin dataloader time: 0.036s (= 7.273s / 200, num_workers=8)
PyTorch swin dataloader time: 0.038s (= 7.681s / 200, num_workers=8)
✔️ Relative speed: 1.056 (= 0.038s / 0.036s)

✔️ OneFlow resnet50 time: 135.7ms (= 13573.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 158.4ms (= 15837.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 158.4ms / 135.7ms)

OneFlow resnet50 time: 91.6ms (= 9158.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.6ms (= 10363.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.13 (= 103.6ms / 91.6ms)

OneFlow resnet50 time: 61.2ms (= 12249.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.0ms (= 15409.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 77.0ms / 61.2ms)

OneFlow resnet50 time: 52.9ms (= 10578.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.6ms (= 13513.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.28 (= 67.6ms / 52.9ms)

OneFlow resnet50 time: 47.3ms (= 9458.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 61.2ms (= 12247.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.29 (= 61.2ms / 47.3ms)

@github-actions
Copy link
Contributor

github-actions bot commented Apr 2, 2022

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/7792/

@mergify mergify bot merged commit a632a2e into master Apr 2, 2022
@mergify mergify bot deleted the export_env_to_python branch April 2, 2022 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants