From recurrence (RNN) to attention-based NLP model

RNN有很多缺点:

linear interaction distance

image.png

lack of parallelizability

image.png

Attention形象化理解:a soft averaging lookup table

image.png

Self-attention : keys, queries and values from the same sequence

image.png

The first problem of Self-attention : sequence order. We introduce Position Value

image.png

Position representation values from Sinusoids

image.png

Position representation vectors learned from scratch

image.png

The second problem of Self-attention : Adding nonlinearities in self-attention

image.png

The third problem of Self-attention : Masking the future in self-attention

image.png

总结: