There's a lot of work being done on this specific part. If you have a standard RNN architecture you want to run, you can probably use the cudnn code in tf.contrib.cudnn to get a super fast implementation.
There is some performance work that needs to be done on properly caching weights between time steps of an RNN if you use a tf.nn.RNNCell. Currently if you want to implement a custom architecture, or a seq2seq decoder, or an RL agent, this is the API you would want to use. Several of the eager benchmarks are based on this API; so that performance will only improve.
I'm hopeful that for the next major release, we'll also have support for eager in tf.contrib.seq2seq.
No comments yet.