[install MXNET] wrong: src/operator/contrib/roi_align_v2.cc #76

heiyuxiaokai · 2019-04-23T05:48:21Z

src/operator/contrib/roi_align_v2.cc:210:2: error: no matching function for call to ‘nnvm::Op::set_attr(const char [12], mxnet::op::<lambda(const nnvm::NodeAttrs&, std::vectormxnet::TShape, std::vectormxnet::TShape)>)’
})
^
In file included from include/mxnet/base.h:35:0,
from src/operator/contrib/./../mshadow_op.h:29,
from src/operator/contrib/./roi_align_v2-inl.h:12,
from src/operator/contrib/roi_align_v2.cc:7:
/home/fw/Softwares/simpledet/mxnet/3rdparty/tvm/nnvm/include/nnvm/op.h:432:12: note: candidate: template nnvm::Op& nnvm::Op::set_attr(const string&, const ValueType&, int)
inline Op& Op::set_attr( // NOLINT()
^
/home/fw/Softwares/simpledet/mxnet/3rdparty/tvm/nnvm/include/nnvm/op.h:432:12: note: template argument deduction/substitution failed:
src/operator/contrib/roi_align_v2.cc:210:2: note: cannot convert ‘mxnet::op::<lambda(const nnvm::NodeAttrs&, std::vectormxnet::TShape, std::vectormxnet::TShape)>{}’ (type ‘mxnet::op::<lambda(const nnvm::NodeAttrs&, std::vectormxnet::TShape, std::vectormxnet::TShape)>’) to type ‘const std::function<bool(const nnvm::NodeAttrs&, std::vector<nnvm::TShape, std::allocatornnvm::TShape >, std::vector<nnvm::TShape, std::allocatornnvm::TShape >)>&’
})
^
src/operator/contrib/roi_align_v2.cc:211:27: error: expected primary-expression before ‘>’ token
.set_attrnnvm::FInferType("FInferType", [](const nnvm::NodeAttrs& attrs,
^
src/operator/contrib/roi_align_v2.cc:223:1: warning: left operand of comma operator has no effect [-Wunused-value]
})
^
src/operator/contrib/roi_align_v2.cc:224:2: error: ‘struct mxnet::op::<lambda(const struct nnvm::NodeAttrs&, class std::vector<int, std::allocator >, class std::vector<int, std::allocator >)>’ has no member named ‘set_attr’
.set_attr("FCompute", ROIAlignForward_v2)
^
src/operator/contrib/roi_align_v2.cc:224:19: error: expected primary-expression before ‘>’ token
.set_attr("FCompute", ROIAlignForward_v2)
^
src/operator/contrib/roi_align_v2.cc:224:38: warning: left operand of comma operator has no effect [-Wunused-value]
.set_attr("FCompute", ROIAlignForward_v2)
^
src/operator/contrib/roi_align_v2.cc:224:38: error: no context to resolve type of ‘ROIAlignForward_v2mxnet::cpu’
src/operator/contrib/roi_align_v2.cc:225:26: error: expected primary-expression before ‘>’ token
.set_attrnnvm::FGradient("FGradient", ROIAlignGrad_v2{"_backward_ROIAlign_v2"})
^
src/operator/contrib/roi_align_v2.cc:225:80: warning: left operand of comma operator has no effect [-Wunused-value]
.set_attrnnvm::FGradient("FGradient", ROIAlignGrad_v2{"_backward_ROIAlign_v2"})
^
src/operator/contrib/roi_align_v2.cc:226:2: error: ‘struct mxnet::op::ROIAlignGrad_v2’ has no member named ‘add_argument’
.add_argument("data", "NDArray-or-Symbol", "Input data to the pooling operator, a 4D Feature maps")
^
g++ -std=c++11 -c -DMSHADOW_FORCE_STREAM -Wall -Wsign-compare -O3 -DNDEBUG=1 -I/home/fw/Softwares/simpledet/mxnet/3rdparty/mshadow/ -I/home/fw/Softwares/simpledet/mxnet/3rdparty/dmlc-core/include -fPIC -I/home/fw/Softwares/simpledet/mxnet/3rdparty/tvm/nnvm/include -I/home/fw/Softwares/simpledet/mxnet/3rdparty/dlpack/include -I/home/fw/Softwares/simpledet/mxnet/3rdparty/tvm/include -Iinclude -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -mf16c -I/usr/local/cuda/include -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -I/home/fw/Softwares/simpledet/mxnet/3rdparty/mkldnn/build/install/include -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -DMSHADOW_USE_PASCAL=0 -DMXNET_USE_MKLDNN=1 -DUSE_MKL=1 -I/home/fw/Softwares/simpledet/mxnet/src/operator/nn/mkldnn/ -I/home/fw/Softwares/simpledet/mxnet/3rdparty/mkldnn/build/install/include -DMXNET_USE_OPENCV=0 -DMSHADOW_USE_CUDNN=1 -DMXNET_USE_DIST_KVSTORE -I/home/fw/Softwares/simpledet/mxnet/3rdparty/ps-lite/include -I/home/fw/Softwares/simpledet/mxnet/deps/include -I/home/fw/Softwares/simpledet/mxnet/3rdparty/nvidia_cub -I/include -DMXNET_USE_NCCL=1 -DMXNET_USE_LIBJPEG_TURBO=0 -MMD -c src/operator/contrib/sync_batch_norm.cc -o build/src/operator/contrib/sync_batch_norm.o
Makefile:508: recipe for target 'build/src/operator/contrib/roi_align_v2.o' failed
make: *** [build/src/operator/contrib/roi_align_v2.o] Error 1
make: *** Waiting for unfinished jobs....
In file included from src/operator/contrib/sync_batch_norm.cc:26:0:
src/operator/contrib/sync_batch_norm-inl.h: In member function ‘virtual bool mxnet::op::SyncBatchNormProp::InferType(std::vector<int, std::allocator >, std::vector<int, std::allocator >, std::vector<int, std::allocator >) const’:
src/operator/contrib/sync_batch_norm-inl.h:587:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (index_t i = 1; i < in_type->size(); ++i) {
^
src/operator/contrib/sync_batch_norm-inl.h:594:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (index_t i = 0; i < aux_type->size(); ++i) {
^

xchani · 2019-04-24T08:08:26Z

It is due to the modification of NNVM API in apache/mxnet#14270. We will fix it soon, or you can switch to an early version of MXNet such as 1.4.0.

heiyuxiaokai · 2019-04-24T08:16:11Z

Thanks for your reply!@xchani
I will try an early version.But when I tried the docker image to train, I got some error with CUDA:

04-24 16:02:04 total iter 868428
04-24 16:02:04 lr 0.00125, lr_iters [2880000, 3840000]
04-24 16:02:04 warmup lr 0.0, warmup step 48000
04-24 16:02:07 Initialized bbox_cls_logit_bias as bias: 0.0
04-24 16:02:07 Initialized bbox_cls_logit_weight as ["normal", {"sigma": 0.01}]: 0.009981153
04-24 16:02:07 Initialized bbox_reg_delta_bias as bias: 0.0
04-24 16:02:07 Initialized bbox_reg_delta_weight as ["normal", {"sigma": 0.001}]: 0.0009942584
04-24 16:02:07 Initialized rpn_bbox_delta_bias as bias: 0.0
04-24 16:02:07 Initialized rpn_bbox_delta_weight as ["normal", {"sigma": 0.01}]: 0.009982816
04-24 16:02:07 Initialized rpn_cls_logit_bias as bias: 0.0
04-24 16:02:07 Initialized rpn_cls_logit_weight as ["normal", {"sigma": 0.01}]: 0.010042911
04-24 16:02:07 Initialized rpn_conv_3x3_bias as bias: 0.0
04-24 16:02:07 Initialized rpn_conv_3x3_weight as ["normal", {"sigma": 0.01}]: 0.009972133
04-24 16:02:08 Initialized stage3_unit21_conv2_offset_bias as bias: 0.0
04-24 16:02:08 Initialized stage3_unit21_conv2_offset_weight as weight: 0.029432593
04-24 16:02:08 Initialized stage3_unit22_conv2_offset_bias as bias: 0.0
04-24 16:02:08 Initialized stage3_unit22_conv2_offset_weight as weight: 0.029415503
04-24 16:02:08 Initialized stage3_unit23_conv2_offset_bias as bias: 0.0
04-24 16:02:08 Initialized stage3_unit23_conv2_offset_weight as weight: 0.029460358
Traceback (most recent call last):
File "detection_train.py", line 231, in
train_net(parse_args())
File "detection_train.py", line 215, in train_net
num_epoch=end_epoch
File "/home/core/detection_module.py", line 995, in fit
self.update_metric(eval_metric, data_batch.label)
File "/home/core/detection_module.py", line 783, in update_metric
self.exec_group.update_metric(eval_metric, labels, pre_sliced)
File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/module/executor_group.py", line 639, in update_metric
eval_metric.update_dict(labels, preds)
File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/metric.py", line 304, in update_dict
metric.update_dict(labels, preds)
File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/metric.py", line 132, in update_dict
self.update(label, pred)
File "/home/core/detection_metric.py", line 41, in update
pred_label = mx.ndarray.argmax_channel(pred).astype('int32').asnumpy().reshape(-1)
File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/ndarray/ndarray.py", line 1972, in asnumpy
ctypes.c_size_t(data.size)))
File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/base.py", line 251, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [08:02:14] /mnt/ournas/yuntao.chen/mxnet-1.3.1-cuda9.0/3rdparty/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:110: Check failed: err == cudaSuccess (48 vs. 0) Name: MapPlanKernel ErrStr:no kernel image is available for execution on the device

Stack trace returned 10 entries:
[bt] (0) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(dmlc::StackTraceabi:cxx11+0x5b) [0x7f49ad84743b]
[bt] (1) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f49ad847fa8]
[bt] (2) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(void mshadow::cuda::MapPlan<mshadow::sv::saveto, mshadow::Tensor<mshadow::gpu, 2, float>, mshadow::expr::ScalarExp, float>(mshadow::expr::Plan<mshadow::Tensor<mshadow::gpu, 2, float>, float>, mshadow::expr::Plan<mshadow::expr::ScalarExp, float> const&, mshadow::Shape<2>, CUstream_st*)+0x1d0) [0x7f49b2563b30]
[bt] (3) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(void mxnet::ndarray::Evalmshadow::gpu(float const&, mxnet::TBlob*, mxnet::RunContext)+0x16a) [0x7f49b275ce2a]
[bt] (4) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(+0x3890d19) [0x7f49b0342d19]
[bt] (5) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(+0x3df061b) [0x7f49b08a261b]
[bt] (6) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x8e5) [0x7f49b089bf15]
[bt] (7) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>, std::shared_ptrdmlc::ManualEvent const&)+0xeb) [0x7f49b08b28ab]
[bt] (8) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(std::_Function_handler<void (std::shared_ptrdmlc::ManualEvent), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock, bool)::{lambda()#4}::operator()() const::{lambda(std::shared_ptrdmlc::ManualEvent)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptrdmlc::ManualEvent&&)+0x4e) [0x7f49b08b2b1e]
[bt] (9) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptrdmlc::ManualEvent)> (std::shared_ptrdmlc::ManualEvent)> >::_M_run()+0x4a) [0x7f49b089b51a]

RogerChern · 2019-04-24T08:20:43Z

Could you please provide your gpu model. This docker will not run on RTX GPUs.

heiyuxiaokai · 2019-04-24T08:23:07Z

2X GTX TITAN X @RogerChern

RogerChern · 2019-04-24T08:49:25Z

@heiyuxiaokai maxwell or pascal TITAN X?

heiyuxiaokai · 2019-04-24T09:43:42Z

maxwell @RogerChern

xchani · 2019-04-24T10:03:16Z

@heiyuxiaokai We will provide docker for this gpu arch later.

heiyuxiaokai · 2019-04-24T10:52:47Z

@xchani Thanks!

heiyuxiaokai · 2019-04-24T12:36:49Z

mxnet==1.3x works
When I import mxnet, libcudart.so.8.0 can't be found. But My ubuntu use cuda9.
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.

import mxnet
Traceback (most recent call last):
File "", line 1, in
File "/home/fw/Softwares/simpledet/mxnet/python/mxnet/init.py", line 24, in
from .context import Context, current_context, cpu, gpu, cpu_pinned
File "/home/fw/Softwares/simpledet/mxnet/python/mxnet/context.py", line 24, in
from .base import classproperty, with_metaclass, _MXClassPropertyMetaClass
File "/home/fw/Softwares/simpledet/mxnet/python/mxnet/base.py", line 213, in
_LIB = _load_lib()
File "/home/fw/Softwares/simpledet/mxnet/python/mxnet/base.py", line 204, in _load_lib
lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
File "/usr/lib/python3.5/ctypes/init.py", line 347, in init
self._handle = _dlopen(self._name, mode)
OSError: libcudart.so.8.0: cannot open shared object file: No such file or directory

That's information of libmxnet.so:
fw@whu:~/Softwares/simpledet/mxnet/lib$ ldd libmxnet.so
linux-vdso.so.1 => (0x00007fff1376e000)
libcudart.so.9.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.9.0 (0x00007fb16dbdc000)
libcublas.so.9.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.9.0 (0x00007fb16a7a6000)
libcurand.so.9.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so.9.0 (0x00007fb166842000)
libcusolver.so.9.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcusolver.so.9.0 (0x00007fb161c47000)
libopenblas.so.0 => /usr/lib/libopenblas.so.0 (0x00007fb15fbb3000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fb15f9ab000)
libcudnn.so.7 => /usr/local/cuda/targets/x86_64-linux/lib/libcudnn.so.7 (0x00007fb14e514000)
libcufft.so.9.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.9.0 (0x00007fb146473000)
libnccl.so.1 => /usr/local/lib/libnccl.so.1 (0x00007fb143e10000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb143a8e000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb143785000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007fb143563000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb14334d000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb143130000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb142d66000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb18057a000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb142b62000)
libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007fb142837000)
libcudart.so.8.0 => not found
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007fb1425f8000)

cuda9 can be found, but cuda8 can't be found while there is no cuda8. Or cuda8 is required?
@xchani @RogerChern

RogerChern · 2019-04-25T03:01:25Z

We have updated cuda9 image to support Maxwell GPUs. Please follow instructions in setup.

xchani added the bug Something isn't working label Apr 24, 2019

RogerChern closed this as completed Apr 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[install MXNET] wrong: src/operator/contrib/roi_align_v2.cc #76

[install MXNET] wrong: src/operator/contrib/roi_align_v2.cc #76

heiyuxiaokai commented Apr 23, 2019

xchani commented Apr 24, 2019

heiyuxiaokai commented Apr 24, 2019

RogerChern commented Apr 24, 2019

heiyuxiaokai commented Apr 24, 2019

RogerChern commented Apr 24, 2019

heiyuxiaokai commented Apr 24, 2019

xchani commented Apr 24, 2019

heiyuxiaokai commented Apr 24, 2019

heiyuxiaokai commented Apr 24, 2019

RogerChern commented Apr 25, 2019

[install MXNET] wrong: src/operator/contrib/roi_align_v2.cc #76

[install MXNET] wrong: src/operator/contrib/roi_align_v2.cc #76

Comments

heiyuxiaokai commented Apr 23, 2019

xchani commented Apr 24, 2019

heiyuxiaokai commented Apr 24, 2019

RogerChern commented Apr 24, 2019

heiyuxiaokai commented Apr 24, 2019

RogerChern commented Apr 24, 2019

heiyuxiaokai commented Apr 24, 2019

xchani commented Apr 24, 2019

heiyuxiaokai commented Apr 24, 2019

heiyuxiaokai commented Apr 24, 2019

RogerChern commented Apr 25, 2019