RNNs are used to work with sequential data. They can handle varying input
lengths as opposed to ANNs. They are also able to effectively learn dependencies
in the sequence.
-
State at a timestep is a non linear function of the linear combination of
state at previous timestep as well as input at that timestep. Output at a
timestep is a function of state at that timestep.
-
RNNs share parameters across timesteps.
-
Parameters are learned through Backprop. Through Time (BPTT)
-
Sequence generation
-
Sequence translation
-
...todo
Simple RNNs suffer from the vanishing and exploding gradient problems. To
overcome this we have variations of the RNN architecture such as LSTM and GRU.