Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

number of LSTM blocks and cells #1

Open
xy0806 opened this issue May 6, 2016 · 6 comments
Open

number of LSTM blocks and cells #1

xy0806 opened this issue May 6, 2016 · 6 comments

Comments

@xy0806
Copy link

xy0806 commented May 6, 2016

Dear Yaseen,
thanks for your clean code.
As you know, there have the conceptions 'LSTM block' and 'LSTM cell'. But in a lot of LSTM example codes, including yours, there seems to be no attention was paid to this difference. In the codes, only cells are created, while no blocks.
After reading and thinking about this problem, I got the conclusion that: the LSTM with m blocks with n cells and the LSTM with one block with m*n cells are actually the same.
Then, how do you think about this problem and could you give me any hints about this issue?

Thanks,
Xin Yang

@uyaseen
Copy link
Owner

uyaseen commented May 6, 2016

Hi Yang,

Glad to know that you found the code helpful.

The distinction between cell and blocks eroded over time, most of modern LSTM architecuters have one cell per block, (which in my opinion is simple), regarding your question, "why not much attention was paid to this difference", I am afraid I might not have a very clear answer, I would say:

-> Do we have any empirical evidence which suggests that LSTMs with multiple cells architecture works better than "one cell per block" architecture ? [I am not aware of any such evidence, If we don't have any such evidence then people will prefer less cumbersome model]; Also same applies for peep-hole connections, some people don't use them as they don't find them very helpful
-> Why not increase the size of the memory (capacity) of a cell instead of adding more cells in a block ? (I would prefer increasing the memory, it's simple & more interpretable ; and to me it looks like they are equivalent as well, since both architectures are using the same gates ["cells within a block use the same gates"])
-> Simple is always better (GRUs are simple version of LSTMs and have almost equivalent performance in many task, therefore, are very popular these days)

I hope above explanation helps a bit, [1] explains the difference between various LSTM architectures.

[1] LSTM: A Search Space Odyssey

@xy0806
Copy link
Author

xy0806 commented May 7, 2016

Dear Yaseen,

Thanks for the quick and informative reply.
I think I may need to ask one more key question which is closely related to
my thoughts and really confuses me now:
if I want to implement a LSTM in which each block contains multiple cells,
how should I modify your code?
could you teach me something about the creation step of multi-cell blocks?

Thanks,
Xin Yang

On 7 May 2016 at 03:49, Usama Yaseen notifications@github.com wrote:

Hi Yang,

Glad to know that you found the code helpful.

The distinction between cell and blocks eroded over time, most of modern
LSTM architecuters have one cell per block, (which in my opinion is
simple), regarding your question, "why not much attention was paid to this
difference", I am afraid I might not have a very clear answer, I would say:

-> Do we have any empirical evidence which suggests that LSTMs with
multiple cells architecture works better than "one cell per block"
architecture ? [I am not aware of any such evidence, If we don't have any
such evidence then people will prefer less cumbersome model]; Also same
applies for peep-hole connections, some people don't use them as they don't
find them very helpful
-> Why not increase the size of the memory (capacity) of a cell instead of
adding more cells in a block ? (I would prefer increasing the memory, it's
simple & more interpretable ; and to me it looks like they are equivalent
as well, since both architectures are using the same gates ["cells within a
block use the same gates"])
-> Simple is always better (GRUs are simple version of LSTMs and have
almost equivalent performance in many task, therefore, are very popular
these days)

I hope above explanation helps a bit, [1] explains the difference between
various LSTM architectures.

[1] LSTM: A Search Space Odyssey
http://arxiv.org/pdf/1503.04069v1.pdf


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#1 (comment)

@uyaseen
Copy link
Owner

uyaseen commented May 7, 2016

I have to look at few papers again to make sure I don't miss anything, but these days I am travelling and don't even have access to my laptop, you have to wait at-least one week for the reply (I am sorry it cannot be earlier than that :/)

@xy0806
Copy link
Author

xy0806 commented May 7, 2016

ok, i can wait for that. i can play with the most simple one those days. ^_^

best
On 7 May 2016 18:12, "Usama Yaseen" notifications@github.com wrote:

I have to look at few papers again to make sure I don't miss anything, but
these days I am travelling and don't even have access to my laptop, you
have to wait at-least one week for the reply (I am sorry it cannot be
earlier than that :/)


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#1 (comment)

@son20112074
Copy link

Thank Yaseen and Xin Yang. This is also my problem.And now, i can more understand.

@DongGuangchang
Copy link

Dear Yaseen,
I ,a rookie in depth learning, encountered some difficulties when debugging your program on recurrent neural networks:
first, it has a error when debugging sample.py . ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.
data size: 49388, vocab size: 75
train(..)
load_data(..)
[Train] # of rows: 987
... transferring data to the GPU
Traceback (most recent call last):
... building the model
File "F:/DL-File/RNN/theano-recurrence-b9b8a82410be005d5a3121345e8d62c5ca547982/train.py", line 145, in
n_h=100, use_existing_model=True, n_epochs=600)
File "F:/DL-File/RNN/theano-recurrence-b9b8a82410be005d5a3121345e8d62c5ca547982/train.py", line 50, in train
rec_params = pkl.load(f)
EOFError


Second, it has a error when debugging train.py .
ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (outputs_info in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 2) has dtype int32, while the result of the inner function (fn) has dtype int64. This can happen if the inner function of scan results in an upcast or downcast.
I hope you help explain the reasons for the above mistakes。Thank you very much!
Thanks,
Liang Dong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants