π Implements Flash Attention with sink for gpt-oss-20b; includes test.py. WIP backward pass, varlen support, and community sync to return softmax_lse only.
What is the kelvindelrosario/flash-attention-with-sink GitHub project? Description: "π Implements Flash Attention with sink for gpt-oss-20b; includes test.py. WIP backward pass, varlen support, and community sync to return softmax_lse only.". Written in Python. Explain what it does, its main use cases, key features, and who would benefit from using it.
Question is copied to clipboard β paste it after the AI opens.
Clone via HTTPS
Clone via SSH
Download ZIP
Download master.zipReport bugs or request features on the flash-attention-with-sink issue tracker:
Open GitHub Issues