Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To turn the CPRS baseline model #9

Open
seongeunryu opened this issue Apr 22, 2023 · 9 comments
Open

To turn the CPRS baseline model #9

seongeunryu opened this issue Apr 22, 2023 · 9 comments

Comments

@seongeunryu
Copy link

Hello, thank you for introducing a great model.

I would like to run the CPRS baseline model, and I'm wondering if I just need to run 'python CPRS.py'. If not, please let me know the correct way to run it.
Also, I couldn't find the data needed to build embdict[word], which is /home/sansa/dataset/no.tsv. Could you share it with me? I am trying to run it using glove.840B.300d.txt, but various errors occur.

Thank you for your reply.

@summmeer
Copy link
Owner

Yes, I guess you can directly run it. The missing file is here.

@seongeunryu
Copy link
Author

seongeunryu commented Apr 22, 2023

Thank you very much for your prompt response.
Would it be possible to also receive the '/home/sansa/dataset/Adressa/articles_category.pkl' file necessary for running the CPRS.py file? I apologize for taking up your time.

Also, in the process of trying to run CPRS, I replaced the 'articles_titles_2.pkl' file mentioned in the code as data for using articles_content with 'articles_titles_4.pkl'. Is there any problem with this? I couldn't obtain the 'articles_titles_2.pkl' file.

@summmeer
Copy link
Owner

The category file is easy to obtain, plz refer to #8 (comment)
The second question: yes, it's ok to do the replace

@seongeunryu
Copy link
Author

Thank you so much for explaining kindly. Sorry to bother you again.
Thanks to the information you provided, I was able to run the CPRS.py code. However, an Out Of Memory error occurred and the training did not proceed.

I have two TITAN Xp 12GB graphics cards, but even when I set the batch size to 1, the following error occurred...

2023-04-23 06:50:20.307285: W tensorflow/c/c_api.cc:300] Operation '{name:'loss/AddN' id:3198 op device:{requested: '', assigned: ''} def:{{{node loss/AddN}} = AddN[N=2, T=DT_FLOAT, has_manual_control_dependencies=true](loss/mul, loss/mul_1)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
2023-04-23 06:50:20.399976: W tensorflow/c/c_api.cc:300] Operation '{name:'training/Adam/dense_2/kernel/v/Assign' id:4898 op device:{requested: '', assigned: ''} def:{{{node training/Adam/dense_2/kernel/v/Assign}} = AssignVariableOp[has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](training/Adam/dense_2/kernel/v, training/Adam/dense_2/kernel/v/Initializer/zeros)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
2023-04-23 06:50:21.799618: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 6913350000 exceeds 10% of free system memory.
2023-04-23 06:50:23.295928: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 6913350000 exceeds 10% of free system memory.
2023-04-23 06:50:24.794740: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 6913350000 exceeds 10% of free system memory.
2023-04-23 06:50:26.830885: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 6913350000 exceeds 10% of free system memory.
2023-04-23 06:50:29.420800: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 6913350000 exceeds 10% of free system memory.
2023-04-23 06:50:50.141729: W tensorflow/tsl/framework/bfc_allocator.cc:485] Allocator (GPU_0_bfc) ran out of memory trying to allocate 6.44GiB (rounded to 6913350144)requested by op training/Adam/gradients/gradients/zeros
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2023-04-23 06:50:50.141804: I tensorflow/tsl/framework/bfc_allocator.cc:1039] BFCAllocator dump for GPU_0_bfc
2023-04-23 06:50:50.141845: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (256): Total Chunks: 84, Chunks in use: 84. 21.0KiB allocated for chunks. 21.0KiB in use in bin. 344B client-requested in use in bin.
2023-04-23 06:50:50.141876: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (512): Total Chunks: 20, Chunks in use: 20. 10.0KiB allocated for chunks. 10.0KiB in use in bin. 7.8KiB client-requested in use in bin.
. . .
2023-04-23 06:50:50.145528: I tensorflow/tsl/framework/bfc_allocator.cc:1107] Sum Total of in-use chunks: 6.53GiB
2023-04-23 06:50:50.145548: I tensorflow/tsl/framework/bfc_allocator.cc:1109] Total bytes in pool: 11920474112 memory_limit
: 11920474112 available bytes: 0 curr_region_allocation_bytes
: 23840948224
2023-04-23 06:50:50.145580: I tensorflow/tsl/framework/bfc_allocator.cc:1114] Stats:
Limit: 11920474112
InUse: 7013281536
MaxInUse: 7013281536
NumAllocs: 147
MaxAllocSize: 6913350144
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0

2023-04-23 06:50:50.145649: W tensorflow/tsl/framework/bfc_allocator.cc:497] ***********************************************************_________________________________________
2023-04-23 06:50:50.145724: W tensorflow/core/framework/op_kernel.cc:1807] OP_REQUIRES failed at constant_op.cc:81 : RESOURCE_EXHAUSTED: OOM when allocating tensor of shape [512100,15,225] and type float
Traceback (most recent call last):
File "CPRS.py", line 515, in
model.fit_generator(traingen, epochs = 3, steps_per_epoch = len(train_label)//batch_size)
File "/home/yyko/.local/lib/python3.8/site-packages/keras/engine/training_v1.py", line 1356, in fit_generator
return self.fit(
File "/home/yyko/.local/lib/python3.8/site-packages/keras/engine/training_v1.py", line 856, in fit
return func.fit(
File "/home/yyko/.local/lib/python3.8/site-packages/keras/engine/training_generator_v1.py", line 647, in fit
return fit_generator(
File "/home/yyko/.local/lib/python3.8/site-packages/keras/engine/training_generator_v1.py", line 282, in model_iteration
batch_outs = batch_function(*batch_data)
File "/home/yyko/.local/lib/python3.8/site-packages/keras/engine/training_v1.py", line 1181, in train_on_batch
outputs = self.train_function(ins)
File "/home/yyko/.local/lib/python3.8/site-packages/keras/backend.py", line 4606, in call
self._make_callable(feed_arrays, feed_symbols, symbol_vals, session)
File "/home/yyko/.local/lib/python3.8/site-packages/keras/backend.py", line 4531, in _make_callable
callable_fn = session._make_callable_from_options(callable_opts)
File "/home/yyko/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1514, in _make_callable_from_options
return BaseSession._Callable(self, callable_options)
File "/home/yyko/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1472, in init
self._handle = tf_session.TF_SessionMakeCallable(
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor of shape [512100,15,225] and type float
[[{{node training/Adam/gradients/gradients/zeros}}]]

Lastly, if it's not too much trouble, could I receive information on the minimum system requirements to train this code?
What kind of envirenment did the author use to run it?
I would appreciate any advice on how to make the training work in my environment. Thank you.

@seongeunryu
Copy link
Author

Could you please check if a lot of memory is used to turn the CPRS?

@summmeer
Copy link
Owner

No, I think the memory usage is totally tolerable.

@amengpa
Copy link

amengpa commented Jun 22, 2023

Hello, thank you very much for providing a great source code.
I tried to configure the file '/home/sansa/dataset/Addressa/articls_category.pkl', but I couldn't create it.
I'm sorry, but can I get the code that makes up the file?
Your offer would be greatly appreciated.

@amengpa
Copy link

amengpa commented Jun 22, 2023

Could not find where 'category_id' is. I'd really appreciate your help.

@summmeer
Copy link
Owner

Maybe this file can help you. @amengpa

articles_embeddings+titles.zip Originally posted by @summmeer in #7 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants