Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while training #4

Open
talk2kabir opened this issue Aug 11, 2020 · 4 comments
Open

Error while training #4

talk2kabir opened this issue Aug 11, 2020 · 4 comments

Comments

@talk2kabir
Copy link

I used the command python3 run.py HS-B as described in the README file to train a new model, but I keep getting the following error for both ATIS and HS datasets
HS-B

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4761/4761 [00:01<00:00, 2964.66it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 576/576 [00:00<00:00, 2962.62it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 594/594 [00:00<00:00, 2817.37it/s]
(?, 200, 256)
(?, 200, 256)
(?, 200, 256)
(?, 200, 256)
(?, 200, 256)
2020-08-11 16:03:29.502581: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
create a new model
0%| | 0/1000 [00:00<?, ?it/s2020-08-11 16:03:43.179957: W tensorflow/core/framework/allocator.cc:122] Allocation of 655360000 exceeds 10% of system memory. | 0/34 [00:00<?, ?it/s]
2020-08-11 16:03:45.284542: W tensorflow/core/framework/allocator.cc:122] Allocation of 655360000 exceeds 10% of system memory.
2020-08-11 16:03:47.412426: W tensorflow/core/framework/allocator.cc:122] Allocation of 655360000 exceeds 10% of system memory.
2020-08-11 16:03:49.541871: W tensorflow/core/framework/allocator.cc:122] Allocation of 655360000 exceeds 10% of system memory.
current accuracy 0.0 string accuarcy is 2
find the better accuracy 0.0in epoches 0
2020-08-11 16:04:03.734827: W tensorflow/core/framework/allocator.cc:122] Allocation of 655360000 exceeds 10% of system memory.
7215
current accuracy 0.0 string accuracy is 2
2020-08-11 16:04:15.335815: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:109 : Not found: HS-B/save_list/save0_1; No such file or directory
0%| | 0/34 [00:39<?, ?it/s]
0%| | 0/1000 [00:39<?, ?it/s]
Traceback (most recent call last):
File "/home/lab/anaconda3/envs/py37/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/lab/anaconda3/envs/py37/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/lab/anaconda3/envs/py37/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: HS-B/save_list/save0_1; No such file or directory

Caused by op 'save_1/SaveV2', defined at:
File "run.py", line 302, in
main()
File "run.py", line 300, in main
run()
File "run.py", line 288, in run
save_model_time(sess, i, str(int(Code_gen_model.steps)))
File "run.py", line 111, in save_model_time
saver = tf.train.Saver()
File "/home/lab/anaconda3/envs/py37/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1102, in init
self.build()
File "/home/lab/anaconda3/envs/py37/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1114, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/lab/anaconda3/envs/py37/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1151, in _build
build_save=build_save, build_restore=build_restore)
File "/home/lab/anaconda3/envs/py37/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 792, in _build_internal
save_tensor = self._AddSaveOps(filename_tensor, saveables)
File "/home/lab/anaconda3/envs/py37/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 284, in _AddSaveOps
save = self.save_op(filename_tensor, saveables)
File "/home/lab/anaconda3/envs/py37/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 202, in save_op
tensors)
File "/home/lab/anaconda3/envs/py37/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1690, in save_v2
shape_and_slices=shape_and_slices, tensors=tensors, name=name)
File "/home/lab/anaconda3/envs/py37/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/lab/anaconda3/envs/py37/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/lab/anaconda3/envs/py37/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/lab/anaconda3/envs/py37/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run.py", line 302, in
main()
File "run.py", line 300, in main
run()
File "run.py", line 288, in run
save_model_time(sess, i, str(int(Code_gen_model.steps)))
File "run.py", line 112, in save_model_time
saver.save(session, project + "save_list/save" + str(number) + "_" + str(card) + "/model.cpkt")
File "/home/lab/anaconda3/envs/py37/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1458, in save
raise exc
ValueError: Parent directory of HS-B/save_list/save0_1/model.cpkt doesn't exist, can't save.

I will appreaciate if you will suggest any solution to solve the problem.

Thank you

@zysszy
Copy link
Owner

zysszy commented Aug 11, 2020

I am sorry for this bug. I have fixed it.

This bug can be fixed by creating a new folder named save_list in HS-B or ATIS or directly commenting out this buggy line.

Thank you for your attention.

Zeyu

@talk2kabir
Copy link
Author

It works fine after creating the folder. Thanks for your timely response.

@riyaj8888
Copy link

`Traceback (most recent call last):
File "C:\Users\7000024203\Anaconda3\envs\TreeGenEnvs\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call
return fn(*args)
File "C:\Users\7000024203\Anaconda3\envs\TreeGenEnvs\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\Users\7000024203\Anaconda3\envs\TreeGenEnvs\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[6,26,0] = 292 is not in [0, 292)
[[{{node embedding_lookup_6}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run.py", line 302, in
main()
File "run.py", line 300, in main
run()
File "run.py", line 245, in run
ac1, loss1, _ = g_eval(sess, Code_gen_model, valid_batch[k])
File "run.py", line 207, in g_eval
model.is_train: False
File "C:\Users\7000024203\Anaconda3\envs\TreeGenEnvs\lib\site-packages\tensorflow\python\client\session.py", line 929, in run
run_metadata_ptr)
File "C:\Users\7000024203\Anaconda3\envs\TreeGenEnvs\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\7000024203\Anaconda3\envs\TreeGenEnvs\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run
run_metadata)
File "C:\Users\7000024203\Anaconda3\envs\TreeGenEnvs\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[6,26,0] = 292 is not in [0, 292)
[[node embedding_lookup_6 (defined at C:\Users\7000024203\Documents\TreeGen\code_generate_model.py:677) ]]

Caused by op 'embedding_lookup_6', defined at:
File "run.py", line 302, in
main()
File "run.py", line 300, in main
run()
File "run.py", line 225, in run
batch_size, NL_vocabu_size, Tree_vocabu_size, NL_len, Tree_len, parent_len, learning_rate, keep_prob, len(char_vocabulary), rules_len)
File "C:\Users\7000024203\Documents\TreeGen\code_generate_model.py", line 677, in init
em_Rule_Son = tf.nn.embedding_lookup(self.Tree_embedding, self.inputrulelistson)
File "C:\Users\7000024203\Anaconda3\envs\TreeGenEnvs\lib\site-packages\tensorflow\python\ops\embedding_ops.py", line 316, in embedding_lookup
transform_fn=None)
File "C:\Users\7000024203\Anaconda3\envs\TreeGenEnvs\lib\site-packages\tensorflow\python\ops\embedding_ops.py", line 133, in _embedding_lookup_and_transform
result = _clip(array_ops.gather(params[0], ids, name=name),
File "C:\Users\7000024203\Anaconda3\envs\TreeGenEnvs\lib\site-packages\tensorflow\python\util\dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "C:\Users\7000024203\Anaconda3\envs\TreeGenEnvs\lib\site-packages\tensorflow\python\ops\array_ops.py", line 3273, in gather
return gen_array_ops.gather_v2(params, indices, axis, name=name)
File "C:\Users\7000024203\Anaconda3\envs\TreeGenEnvs\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 3747, in gather_v2
"GatherV2", params=params, indices=indices, axis=axis, name=name)
File "C:\Users\7000024203\Anaconda3\envs\TreeGenEnvs\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "C:\Users\7000024203\Anaconda3\envs\TreeGenEnvs\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "C:\Users\7000024203\Anaconda3\envs\TreeGenEnvs\lib\site-packages\tensorflow\python\framework\ops.py", line 3300, in create_op
op_def=op_def)
File "C:\Users\7000024203\Anaconda3\envs\TreeGenEnvs\lib\site-packages\tensorflow\python\framework\ops.py", line 1801, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): indices[6,26,0] = 292 is not in [0, 292)
`

I am facing above error during training

@zysszy
Copy link
Owner

zysszy commented Nov 29, 2020

Hello,
Could you please provide more details about this exception (e.g., the environment, the version of Tensorflow, the command you used, the dataset you used)?

Zeyu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants