Learning Longer Memory in Recurrent Neural Networks

Mikolov, Tomas; Joulin, Armand; Chopra, Sumit; Mathieu, Michael; Ranzato, Marc'Aurelio

Computer Science > Neural and Evolutionary Computing

arXiv:1412.7753 (cs)

[Submitted on 24 Dec 2014 (v1), last revised 16 Apr 2015 (this version, v2)]

Title:Learning Longer Memory in Recurrent Neural Networks

Authors:Tomas Mikolov, Armand Joulin, Sumit Chopra, Michael Mathieu, Marc'Aurelio Ranzato

View PDF

Abstract:Recurrent neural network is a powerful model that learns temporal patterns in sequential data. For a long time, it was believed that recurrent networks are difficult to train using simple optimizers, such as stochastic gradient descent, due to the so-called vanishing gradient problem. In this paper, we show that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent. This is achieved by using a slight structural modification of the simple recurrent neural network architecture. We encourage some of the hidden units to change their state slowly by making part of the recurrent weight matrix close to identity, thus forming kind of a longer term memory. We evaluate our model in language modeling experiments, where we obtain similar performance to the much more complex Long Short Term Memory (LSTM) networks (Hochreiter & Schmidhuber, 1997).

Subjects:	Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Cite as:	arXiv:1412.7753 [cs.NE]
	(or arXiv:1412.7753v2 [cs.NE] for this version)
	https://doi.org/10.48550/arXiv.1412.7753

Submission history

From: Tomas Mikolov [view email]
[v1] Wed, 24 Dec 2014 20:58:18 UTC (222 KB)
[v2] Thu, 16 Apr 2015 23:37:58 UTC (223 KB)

Computer Science > Neural and Evolutionary Computing

Title:Learning Longer Memory in Recurrent Neural Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Neural and Evolutionary Computing

Title:Learning Longer Memory in Recurrent Neural Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators