-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Current MXNet-Dev master breaks loading of certain models #15337
Comments
Hey, this is the MXNet Label Bot. |
@mxnet-label-bot add [Bug] |
@mxnet-label-bot add [Backend] |
This issue might also be related to #15281. |
Hi @QueensGambit I'm getting file not found error when following the steps to reproduce I do have
|
Also I'm getting parameter not found when trying to load the symbol and params directly
|
@roywei Thank your for the reply. This is the code how the model is currently loaded: sym = mx.sym.load(self.symbol_path)
# https://github.com/apache/incubator-mxnet/issues/6951
save_dict = mx.nd.load(self.params_path)
arg_params = {}
aux_params = {}
for key, val in save_dict.items():
param_type, name = key.split(":", 1)
if param_type == "arg":
arg_params[name] = val
if param_type == "aux":
aux_params[name] = val
# set the context on CPU, switch to GPU if there is one available
if ctx == "cpu":
self.ctx = mx.cpu()
elif ctx == "gpu":
self.ctx = mx.gpu()
else:
raise Exception("Unavailable ctx mode given %s. You must either select 'cpu' or 'gpu'" % ctx)
# define batch_size times executor objects which are used for inference
# one executor object is used for the currently requested batch batch length
# the requested batch length is variable and at maximum the given batch_size
self.executors = []
for i in range(batch_size):
executor = sym.simple_bind(
ctx=self.ctx,
# add a new length for each size starting with 1
data=(i + 1, NB_CHANNELS_FULL, BOARD_HEIGHT, BOARD_WIDTH),
grad_req="null",
force_rebind=True,
)
executor.copy_params_from(arg_params, aux_params)
self.executors.append(executor) |
I think, I know why the loading fails, thank you for help @roywei. It's because I ported the training code from Gluon to MXNet for this model. The reason for this was that I experienced long delays during training due to Apparently in MXNet version 1.4.1 the code above works successfully and ignores the missing label information whereas version 1.5.0 blocks it, which is a behaviour I appreciate. Using this code I'm able to successfully load the model both in version MXNet 1.4.1 and MXNet 1.5.0: model_arch_path = 'model-1.19246-0.603-symbol.json'
model_params_path = 'model-1.19246-0.603-0223.params'
ctx = mx.cpu()
symbol = mx.sym.load(model_arch_path)
inputs = mx.sym.var('data', dtype='float32')
value_out = symbol.get_internals()['value_tanh0_output']
policy_out = symbol.get_internals()['flatten0_output']
sym = mx.symbol.Group([value_out, policy_out])
net = mx.gluon.SymbolBlock(sym, inputs)
net.collect_params().load(model_params_path, ctx) Consequently, this issue can be closed. |
See insightface #764 |
Description
The current MXNET master dev branch, pypi version 1.5.0b20190623 breaks the loading of certain MXNET-models (both in mxnet-mkl & mxnet-cu100), which previously were loaded successfully with mxnet==1.4.1.
The model uses grouped depthwise (a.ka. depthwise seperable) convolutions which could be the cause for this issue because other models (e.g. CrazyAraFish_0.5.0_RiseV1.zip) still work correctly as usual.
Environment info
I'm using python, but the same problem also occurs when building the MXNET-CPP package from source.
Error Message:
Minimum reproducible example
Steps to reproduce
Download release
CrazyAra_0.5.0_RiseV2_mobile.zip
at:Install python-chess.
Extract
CrazyAra_0.5.0_RiseV2_mobile.zip
and runfrom the commandline.
More details for install instructions can be found here:
Alternatively, you can load the mxnet model from the
model/
directory manually in python.Does someones have an idea what recent change causes this?
Can you include more automated unit tests for MXNET to ensure that the loading of different model types is preserved for version updates?
The text was updated successfully, but these errors were encountered: