Before.. RNN LSTM : slow to train Can we parallelize sequential data? Transformers Input sequence can be transmitted parallel No concept of time step Pass all the words simultaneously and determine the word embedding simultaneously (RNN passes input word one after another) Input Embedding In embedding space, close-meaning words locate close to each other There're already pretrained embedding spa..