This is fantastic. Thanks for taking the time to write this (and the whole series). One tiny thing which I found a bit confusing and perhaps could do with a little bit more of a clarification:
Output will be the last state of every layer in the network as an LSTMStateTuple stored in
current_state
as well as a tensorstates_series
with the shape[batch_size, truncated_backprop_length, state_size]
containing the hidden state of the last layer across all time-steps.
Could possibly be expanded as:
Output will be the internal state (both cell state and hidden state) of every layer in the network for the final timestep as a tuple (for each layer) of LSTMStateTuple stored in
current_state
as well as a tensorstates_series
with the shape[batch_size, truncated_backprop_length, state_size]
containing the output of the last layer for each time-step.
(The bold bits are not for emphasis, they’re just to indicate which bits I changed).
This doesn’t contradict what you say, just avoids some ambiguity (at least it wasn’t clear to me just from reading it).
Finally, with batch_size=3
, state_size=3
and truncated_backprop_length=3
it’s a bit tricky to read the diagrams, since so many dimensions are of size 3! If say batch_size was 4 and state_size was 5 it could be much more immediately obvious.