MXNet USE_SSE=1 build uses AVX instruction set #14664

mjpost · 2019-04-10T14:47:10Z

On my desktop (with no GPU), with CUDA 10 libraries loaded, when I attempt to import mxnet in Python, I get the following error:

$ pip list | grep mxnet
mxnet-cu100mkl    1.3.1     
$ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mxnet
Illegal instruction
$

But everything works fine in another conda environment with mxnet-92mkl and CUDA 9.2 libraries loaded:

$ pip list | grep mxnet
mxnet-cu92mkl    1.3.1     
$ python
Python 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 17:14:51) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mxnet
>>>

Is there any advice how to fix this? I cannot find a similar issue filed.

The text was updated successfully, but these errors were encountered:

mjpost · 2019-04-10T14:53:17Z

I also tried upgrading to MXNet 1.4.0, but got the same "Illegal instruction" problem.

wkcn · 2019-04-10T15:58:25Z

Why do you install the GPU version in a machine without GPU? It seems that the program crashes since CUDA library files not found.
You can install the CPU version of MXNet: mxnet-mkl

If you want to install the GPU version, could you please provide the stacktrace?

gdb python
r
import mxnet
bt

mjpost · 2019-04-10T16:02:37Z

The GPU libraries are available and in my path. If I unload them, I get a different message from the linker.

The reason is that I am submitting jobs to a cluster. I have one job that uses MXNet but just for data preparation so it doesn't need a GPU. It's a pain to have to maintain separate environments. As long as the libraries are found, it seems MXNet should be able to work smoothly with a CPU device, instead of a GPU one?

Plus, this used to work fine.

wkcn · 2019-04-10T16:08:37Z

I will test it. I have the environment with CUDA 10 and without GPU.

mjpost · 2019-04-10T16:09:36Z

Thank you!

wkcn · 2019-04-11T02:41:51Z

Sorry that the version of CUDA in my machine is 10.1.105. It seems that there is no any related pre-build
MXNet package.

lanking520 · 2019-04-11T23:13:20Z

@szha

szha · 2019-04-12T05:26:07Z

Since cu92mkl and cu100mkl only differs in cuda/cudnn versions, I'm guessing this is related to cudnn, since the library is statically linked. @mjpost, does it work if you try to build from source? You can find the configuration in #8671

szha · 2019-04-12T17:25:57Z

@mjpost just as a quick sanity check, does it work if you install from nightly build? (i.e. pip install --pre)

mjpost · 2019-04-12T17:58:35Z

I get the same error with the nightly build:

$ module list

Currently Loaded Modules:
  1) shared   2) StdEnv   3) dot   4) uge/8.6.4   5) default-environment   6) cuda10.0/toolkit/10.0.130   7) gcc/5.4.0   8) cudnn/7.5.0_cuda10.0

$ pip install -U --pre mxnet-cu100mkl
Collecting mxnet-cu100mkl
Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)",)': /packages/c7/c2/06986f51da6052f4fa673ed1c5f55842c3f216b5e1b8d25bb59a588e8161/mxnet_cu100mkl-1.5.0b20190412-py2.py3-none-manylinux1_x86_64.whl
  Downloading https://files.pythonhosted.org/packages/c7/c2/06986f51da6052f4fa673ed1c5f55842c3f216b5e1b8d25bb59a588e8161/mxnet_cu100mkl-1.5.0b20190412-py2.py3-none-manylinux1_x86_64.whl (554.8MB)
    100% |████████████████████████████████| 554.8MB 17kB/s 
Requirement already satisfied, skipping upgrade: graphviz<0.9.0,>=0.8.1 in ./.conda/envs/cu100/lib/python3.6/site-packages (from mxnet-cu100mkl) (0.8.4)
Requirement already satisfied, skipping upgrade: numpy<1.15.0,>=1.8.2 in ./.conda/envs/cu100/lib/python3.6/site-packages (from mxnet-cu100mkl) (1.14.6)
Requirement already satisfied, skipping upgrade: requests>=2.20.0 in ./.conda/envs/cu100/lib/python3.6/site-packages (from mxnet-cu100mkl) (2.21.0)
Requirement already satisfied, skipping upgrade: urllib3<1.25,>=1.21.1 in ./.conda/envs/cu100/lib/python3.6/site-packages (from requests>=2.20.0->mxnet-cu100mkl) (1.24.1)
Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in ./.conda/envs/cu100/lib/python3.6/site-packages (from requests>=2.20.0->mxnet-cu100mkl) (2019.3.9)
Requirement already satisfied, skipping upgrade: chardet<3.1.0,>=3.0.2 in ./.conda/envs/cu100/lib/python3.6/site-packages (from requests>=2.20.0->mxnet-cu100mkl) (3.0.4)
Requirement already satisfied, skipping upgrade: idna<2.9,>=2.5 in ./.conda/envs/cu100/lib/python3.6/site-packages (from requests>=2.20.0->mxnet-cu100mkl) (2.8)
Installing collected packages: mxnet-cu100mkl
  Found existing installation: mxnet-cu100mkl 1.4.0.post0
    Uninstalling mxnet-cu100mkl-1.4.0.post0:
      Successfully uninstalled mxnet-cu100mkl-1.4.0.post0
Successfully installed mxnet-cu100mkl-1.5.0b20190412

$ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mxnet
Illegal instruction
$

mjpost · 2019-04-12T17:59:29Z

But note that I am trying to stick with MXNet 1.3.1, since I have a branch of research code using software (sockeye) that isn't upgraded to 1.4 yet.

szha · 2019-04-12T19:11:53Z

@mjpost the only difference between the mxnet-cu92* and mxnet-cu100* would be in the cuda/cudnn library. @DickJC123 do you know if the cuda libraries started to use any new instruction sets?

mjpost · 2019-04-12T19:19:03Z

Could it have to do with the AVX instruction set? I found a few issues where people mention this, but I don't perfectly understand them.

Note that I haven't tried building from source yet. I hope to do that this weekend.

szha · 2019-04-12T22:11:18Z

@mjpost mxnet pre-built binaries require AVX2 instruction set. However, this is true for all packages and not just cu100. So it must be something else.

bricksdont · 2019-05-16T07:58:52Z

Same problem here. Any updates on this?

k128 · 2020-01-21T22:45:46Z

I'm getting the same error
Ubuntu 18.04
pip install mxnet-mkl
Also tried pip install mxnet and pip install mxnet-mkl --pre

leezu · 2020-01-21T22:50:37Z

@k128 please provide information about your cpu. Ie output of cat /proc/cpuinfo

k128 · 2020-01-21T23:03:57Z

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
stepping : 4
microcode : 0x12
cpu MHz : 1649.507
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm pti tpr_shadow vnmi flexpriority ept vpid dtherm ida
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 5345.76
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
stepping : 4
microcode : 0x12
cpu MHz : 1605.370
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 1
cpu cores : 4
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm pti tpr_shadow vnmi flexpriority ept vpid dtherm ida
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 5345.76
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
stepping : 4
microcode : 0x12
cpu MHz : 1655.422
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 2
cpu cores : 4
apicid : 4
initial apicid : 4
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm pti tpr_shadow vnmi flexpriority ept vpid dtherm ida
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 5345.76
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
stepping : 4
microcode : 0x12
cpu MHz : 1655.970
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm pti tpr_shadow vnmi flexpriority ept vpid dtherm ida
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 5345.76
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 4
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
stepping : 4
microcode : 0x12
cpu MHz : 1663.665
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm pti tpr_shadow vnmi flexpriority ept vpid dtherm ida
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 5345.76
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 5
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
stepping : 4
microcode : 0x12
cpu MHz : 1633.267
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 1
cpu cores : 4
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm pti tpr_shadow vnmi flexpriority ept vpid dtherm ida
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 5345.76
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 6
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
stepping : 4
microcode : 0x12
cpu MHz : 1684.141
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 2
cpu cores : 4
apicid : 5
initial apicid : 5
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm pti tpr_shadow vnmi flexpriority ept vpid dtherm ida
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 5345.76
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
stepping : 4
microcode : 0x12
cpu MHz : 1703.844
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm pti tpr_shadow vnmi flexpriority ept vpid dtherm ida
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 5345.76
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

leezu · 2020-01-22T22:46:01Z

Thanks @k128. Your CPU doesn't support AVX instruction set, but the binary package you obtained via pip install requires AVX. We may drop this AVX requirement from future versions of the binary packages, but in the meantime you need to build from source.

Also, please provide the output of gcc -Q --help=target

You can find build from source instructions at https://mxnet.apache.org/get_started/ubuntu_setup

k128 · 2020-01-22T22:50:52Z

The following options are target specific:
  -m128bit-long-double        		[enabled]
  -m16                        		[disabled]
  -m32                        		[disabled]
  -m3dnow                     		[disabled]
  -m3dnowa                    		[disabled]
  -m64                        		[enabled]
  -m80387                     		[enabled]
  -m8bit-idiv                 		[disabled]
  -m96bit-long-double         		[disabled]
  -mabi=                      		sysv
  -mabm                       		[disabled]
  -maccumulate-outgoing-args  		[disabled]
  -maddress-mode=             		long
  -madx                       		[disabled]
  -maes                       		[disabled]
  -malign-data=               		compat
  -malign-double              		[disabled]
  -malign-functions=          		0
  -malign-jumps=              		0
  -malign-loops=              		0
  -malign-stringops           		[enabled]
  -mandroid                   		[disabled]
  -march=                     		x86-64
  -masm=                      		att
  -mavx                       		[disabled]
  -mavx2                      		[disabled]
  -mavx256-split-unaligned-load 	[enabled]
  -mavx256-split-unaligned-store 	[enabled]
  -mavx5124fmaps              		[disabled]
  -mavx5124vnniw              		[disabled]
  -mavx512bw                  		[disabled]
  -mavx512cd                  		[disabled]
  -mavx512dq                  		[disabled]
  -mavx512er                  		[disabled]
  -mavx512f                   		[disabled]
  -mavx512ifma                		[disabled]
  -mavx512pf                  		[disabled]
  -mavx512vbmi                		[disabled]
  -mavx512vl                  		[disabled]
  -mavx512vpopcntdq           		[disabled]
  -mbionic                    		[disabled]
  -mbmi                       		[disabled]
  -mbmi2                      		[disabled]
  -mbranch-cost=              		3
  -mcld                       		[disabled]
  -mclflushopt                		[disabled]
  -mclwb                      		[disabled]
  -mclzero                    		[disabled]
  -mcmodel=                   		[default]
  -mcpu=                      		
  -mcrc32                     		[disabled]
  -mcx16                      		[disabled]
  -mdispatch-scheduler        		[disabled]
  -mdump-tune-features        		[disabled]
  -mf16c                      		[disabled]
  -mfancy-math-387            		[enabled]
  -mfentry                    		[disabled]
  -mfma                       		[disabled]
  -mfma4                      		[disabled]
  -mforce-drap                		[disabled]
  -mfp-ret-in-387             		[enabled]
  -mfpmath=                   		sse
  -mfsgsbase                  		[disabled]
  -mfunction-return=          		keep
  -mfused-madd                		
  -mfxsr                      		[enabled]
  -mgeneral-regs-only         		[disabled]
  -mglibc                     		[enabled]
  -mhard-float                		[enabled]
  -mhle                       		[disabled]
  -miamcu                     		[disabled]
  -mieee-fp                   		[enabled]
  -mincoming-stack-boundary=  		0
  -mindirect-branch-register  		[disabled]
  -mindirect-branch=          		keep
  -minline-all-stringops      		[disabled]
  -minline-stringops-dynamically 	[disabled]
  -mintel-syntax              		
  -mlarge-data-threshold=<number> 	65536
  -mlong-double-128           		[disabled]
  -mlong-double-64            		[disabled]
  -mlong-double-80            		[enabled]
  -mlwp                       		[disabled]
  -mlzcnt                     		[disabled]
  -mmemcpy-strategy=          		
  -mmemset-strategy=          		
  -mmitigate-rop              		[disabled]
  -mmmx                       		[enabled]
  -mmovbe                     		[disabled]
  -mmpx                       		[disabled]
  -mms-bitfields              		[disabled]
  -mmusl                      		[disabled]
  -mmwaitx                    		[disabled]
  -mno-align-stringops        		[disabled]
  -mno-default                		[disabled]
  -mno-fancy-math-387         		[disabled]
  -mno-push-args              		[disabled]
  -mno-red-zone               		[disabled]
  -mno-sse4                   		[enabled]
  -mnop-mcount                		[disabled]
  -momit-leaf-frame-pointer   		[disabled]
  -mpc32                      		[disabled]
  -mpc64                      		[disabled]
  -mpc80                      		[disabled]
  -mpclmul                    		[disabled]
  -mpcommit                   		[disabled]
  -mpku                       		[disabled]
  -mpopcnt                    		[disabled]
  -mprefer-avx128             		[disabled]
  -mpreferred-stack-boundary= 		0
  -mprefetchwt1               		[disabled]
  -mprfchw                    		[disabled]
  -mpush-args                 		[enabled]
  -mrdpid                     		[disabled]
  -mrdrnd                     		[disabled]
  -mrdseed                    		[disabled]
  -mrecip                     		[disabled]
  -mrecip=                    		
  -mrecord-mcount             		[disabled]
  -mred-zone                  		[enabled]
  -mregparm=                  		6
  -mrtd                       		[disabled]
  -mrtm                       		[disabled]
  -msahf                      		[disabled]
  -msgx                       		[disabled]
  -msha                       		[disabled]
  -mskip-rax-setup            		[disabled]
  -msoft-float                		[disabled]
  -msse                       		[enabled]
  -msse2                      		[enabled]
  -msse2avx                   		[disabled]
  -msse3                      		[disabled]
  -msse4                      		[disabled]
  -msse4.1                    		[disabled]
  -msse4.2                    		[disabled]
  -msse4a                     		[disabled]
  -msse5                      		
  -msseregparm                		[disabled]
  -mssse3                     		[disabled]
  -mstack-arg-probe           		[disabled]
  -mstack-protector-guard=    		tls
  -mstackrealign              		[disabled]
  -mstringop-strategy=        		[default]
  -mstv                       		[enabled]
  -mtbm                       		[disabled]
  -mtls-dialect=              		gnu
  -mtls-direct-seg-refs       		[enabled]
  -mtune-ctrl=                		
  -mtune=                     		generic
  -muclibc                    		[disabled]
  -mveclibabi=                		[default]
  -mvect8-ret-in-mem          		[disabled]
  -mvzeroupper                		[enabled]
  -mx32                       		[disabled]
  -mxop                       		[disabled]
  -mxsave                     		[disabled]
  -mxsavec                    		[disabled]
  -mxsaveopt                  		[disabled]
  -mxsaves                    		[disabled]

  Known assembler dialects (for use with the -masm= option):
    att intel

  Known ABIs (for use with the -mabi= option):
    ms sysv

  Known code models (for use with the -mcmodel= option):
    32 kernel large medium small

  Valid arguments to -mfpmath=:
    387 387+sse 387,sse both sse sse+387 sse,387

  Known indirect branch choices (for use with the -mindirect-branch=/-mfunction-return= options):
    keep thunk thunk-extern thunk-inline

  Known data alignment choices (for use with the -malign-data= option):
    abi cacheline compat

  Known vectorization library ABIs (for use with the -mveclibabi= option):
    acml svml

  Known address mode (for use with the -maddress-mode= option):
    long short

  Known stack protector guard (for use with the -mstack-protector-guard= option):
    global tls

  Valid arguments to -mstringop-strategy=:
    byte_loop libcall loop rep_4byte rep_8byte rep_byte unrolled_loop
    vector_loop

  Known TLS dialects (for use with the -mtls-dialect= option):
    gnu gnu2

leezu · 2020-01-22T23:56:23Z

Thank you. Please let me know if you face any issues with the source compiled version of mxnet (https://mxnet.apache.org/get_started/ubuntu_setup)

k128 · 2020-01-23T12:42:45Z

I might be doing something completely wrong here, but I installed MKL via APT without error and used the code on this page in order: https://mxnet.apache.org/api/python/docs/tutorials/performance/backend/mkldnn/mkldnn_readme.html

Exact code I ran:

#Installing MKL
sudo bash
cd /tmp
wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
rm GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
exit
sudo wget https://apt.repos.intel.com/setup/intelproducts.list -O /etc/apt/sources.list.d/intelproducts.list
sudo apt update
sudo apt install intel-mkl-64bit-2020.0-088

#Building MXNet
sudo apt install -y build-essential git libopenblas-dev liblapack-dev libopencv-dev graphviz
git clone --recursive https://github.com/apache/incubator-mxnet.git
cd incubator-mxnet
mkdir build && cd build
cmake -DUSE_CUDA=OFF -DUSE_MKL_IF_AVAILABLE=ON -DUSE_MKLDNN=ON -DUSE_OPENMP=ON -DUSE_OPENCV=ON ..
make -j $(nproc)

Results:
Lots of warnings like: 
/home/k64/incubator-mxnet/src/operator/numpy/./../nn/../tensor/././elemwise_unary_op.h:262:50: warning: typedef ‘DType’ locally defined but not used [-Wunused-local-typedefs]
     MXNET_INT_TYPE_SWITCH(outputs[0].type_flag_, DType, {

Errors:
CMakeFiles/Makefile2:1165: recipe for target 'CMakeFiles/mxnet_static.dir/all' failed
make[1]: *** [CMakeFiles/mxnet_static.dir/all] Error 2
Makefile:140: recipe for target 'all' failed
make: *** [all] Error 2

leezu · 2020-01-23T17:33:52Z

@k128 please post the complete output of the cmake command as well as the make -j $(nproc). You can attach them as files to your response

leezu · 2020-01-24T23:46:59Z

I can reproduce this issue on AWS c1 instances introduced 2010. Can't reproduce on m3 instance introduced 2012. Note that m3 doesn't support AVX2, so this issue is unrelated to AVX2.

m3 supports the following additional instruction sets compared to c1:

> avx
> f16c
> x2apic
> xsave
> xsaveopt
> xtopology

Given the nature of this problem it's likely due to avx instruction set.

Further the use of __m128d type in https://github.com/apache/incubator-mxnet/blob/1434b98e26ace5300f17465fbb2942272a3dfd77/3rdparty/mshadow/mshadow/packet/sse-inl.h
get's converted by GCC to AVX instruction VBROADCASTSS. This may be a GCC bug, but is one of the ways we get AVX into the library.

leezu · 2020-01-24T23:55:06Z

@mjpost could you verify if mxnet and mxnet-cu100 1.5.1 still don't work for you. If so, please also share the cat /proc/cpuinfo of your machine.

leezu · 2020-01-27T21:34:04Z

@k128 @mjpost could you try installing https://lausen-public.s3.amazonaws.com/mxnet_cu100-1.6.0b20200127-py2.py3-none-manylinux1_x86_64.whl and report if it works?

The library is built using #17448. Setting -mno-avx in our cmake build correctly disables avx instructions. To completely get rid of avx inside libmxnet.so we also need to build statically linked dependencies accordingly. This is not yet done (maybe conan / vcpkg can help) and as a workaround the build is performed on m2.4xlarge instance (which doesn't have AVX)

lanking520 added Breaking CUDA pip labels Apr 10, 2019

leezu changed the title ~~mxnet-cu100mkl "illegal instruction" on CPU~~ tools/staticbuild/build.sh uses AVX2 instruction set Jan 23, 2020

leezu self-assigned this Jan 23, 2020

This comment has been minimized.

Sign in to view

leezu changed the title ~~tools/staticbuild/build.sh uses AVX2 instruction set~~ MXNet USE_SSE=1 build uses AVX instruction set Jan 24, 2020

leezu mentioned this issue Jan 30, 2020

Add -march=native -mtune=native to config/config.cmake #17468

Open

1 task

leezu mentioned this issue Mar 16, 2020

CI: Attempt fixing illegal instruction errors #17842

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MXNet USE_SSE=1 build uses AVX instruction set #14664

MXNet USE_SSE=1 build uses AVX instruction set #14664

mjpost commented Apr 10, 2019

mjpost commented Apr 10, 2019

wkcn commented Apr 10, 2019 •

edited

Loading

mjpost commented Apr 10, 2019

wkcn commented Apr 10, 2019

mjpost commented Apr 10, 2019

wkcn commented Apr 11, 2019 •

edited

Loading

lanking520 commented Apr 11, 2019

szha commented Apr 12, 2019

szha commented Apr 12, 2019

mjpost commented Apr 12, 2019

mjpost commented Apr 12, 2019

szha commented Apr 12, 2019

mjpost commented Apr 12, 2019

szha commented Apr 12, 2019

bricksdont commented May 16, 2019

k128 commented Jan 21, 2020

leezu commented Jan 21, 2020

k128 commented Jan 21, 2020

leezu commented Jan 22, 2020

k128 commented Jan 22, 2020 •

edited by leezu

Loading

leezu commented Jan 22, 2020

k128 commented Jan 23, 2020 •

edited by leezu

Loading

leezu commented Jan 23, 2020

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

leezu commented Jan 24, 2020 •

edited

Loading

leezu commented Jan 24, 2020

leezu commented Jan 27, 2020

MXNet USE_SSE=1 build uses AVX instruction set #14664

MXNet USE_SSE=1 build uses AVX instruction set #14664

Comments

mjpost commented Apr 10, 2019

mjpost commented Apr 10, 2019

wkcn commented Apr 10, 2019 • edited Loading

mjpost commented Apr 10, 2019

wkcn commented Apr 10, 2019

mjpost commented Apr 10, 2019

wkcn commented Apr 11, 2019 • edited Loading

lanking520 commented Apr 11, 2019

szha commented Apr 12, 2019

szha commented Apr 12, 2019

mjpost commented Apr 12, 2019

mjpost commented Apr 12, 2019

szha commented Apr 12, 2019

mjpost commented Apr 12, 2019

szha commented Apr 12, 2019

bricksdont commented May 16, 2019

k128 commented Jan 21, 2020

leezu commented Jan 21, 2020

k128 commented Jan 21, 2020

leezu commented Jan 22, 2020

k128 commented Jan 22, 2020 • edited by leezu Loading

leezu commented Jan 22, 2020

k128 commented Jan 23, 2020 • edited by leezu Loading

leezu commented Jan 23, 2020

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

leezu commented Jan 24, 2020 • edited Loading

leezu commented Jan 24, 2020

leezu commented Jan 27, 2020

wkcn commented Apr 10, 2019 •

edited

Loading

wkcn commented Apr 11, 2019 •

edited

Loading

k128 commented Jan 22, 2020 •

edited by leezu

Loading

k128 commented Jan 23, 2020 •

edited by leezu

Loading

leezu commented Jan 24, 2020 •

edited

Loading