ROCm-compatible fork of Bittensor – Full PyTorch 2.4 ROCm support – Wallet, Metagraph, Dendrite fully working.
AMD/NVIDIA GPU cluster infrastructure — ~300 GPU deployment, ROCm, kernel tuning, multi-node benchmarking
Production-grade local LLM deployment stack — llama.cpp, Ollama, GGUF/GGML, ROCm AMD, 14B to 80B models
Production infrastructure scripts — ROCm setup, multi-GPU config, server hardening, LLM deployment automation
FLUX.1-dev on AMD Radeon consumer GPUs — fast, low-VRAM, and shippable. Backport patches + benchmarks for torchao + diffusers group_offload on ROCm.
Multi-GPU tensor/context parallel diffusion on AMD ROCm — with the patch that makes it actually work.