In this project, attention mechanism is implemented in CUDA by utilizing shared memory, coalesced memory, warp shuffle, and tiling.