7

Attention Mechanism Explained

We have seen the Encoder Decoder architecture in the previous blog, now lest check if we can do any better than passing last layer timestep outputs to first timestep of decoder.

Most of this blog will be based on this paper.

Insert your error message here, if the PDF cannot be displayed.

its not mandotory to go through this paper, i will try to simplify things. But It will be very nice if you can spend some time to go through it.

Most of this blog will be based on the images that i have created in using powerpoint.

credit: https://guillaumegenthial.github.io

Note: I am not including the full code here, as this is the part of the assignments of appliedaicourse