Attention in CUDA

In this project, attention mechanism is implemented in CUDA by utilizing shared memory, coalesced memory, warp shuffle, and tiling.

GPU Memory Architecture High Quality

Matrix Multiplication

Matrix Multiplication

Softmax

Transpose

Multi-Head Attention Mechanism

Attention Mechanism