RNN有很多缺点:
linear interaction distance

lack of parallelizability

Attention形象化理解:a soft averaging lookup table

Self-attention : keys, queries and values from the same sequence

The first problem of Self-attention : sequence order. We introduce Position Value

Position representation values from Sinusoids

Position representation vectors learned from scratch

The second problem of Self-attention : Adding nonlinearities in self-attention

The third problem of Self-attention : Masking the future in self-attention

总结: