Universal GPU setup for PyTorch and JAX β DirectML, ROCm, CUDA, MPS, CPU. Auto-detects hardware, fixes HSA_OVERRIDE_GFX_VERSION for AMD RX 5700 XT. Beats devicetorch + torchruntime combined.
Auto-detect AMD GPU for PyTorch β ROCm, DirectML, CUDA, MPS, CPU. Fixes gfx1010 (RX 5700 XT) HSA_OVERRIDE_GFX_VERSION automatically. Windows/Linux/macOS/WSL2.
Run Ollama with AMD GPU on Windows -- WSL2, Vulkan, Docker methods. RX 5000/6000/7000/9000 series. The guide that should exist but doesn't.
ROCm 5.x β 6.x migration guide β parallel install, PyTorch compatibility matrix, breaking changes, rollback. The guide AMD should have written.
Reproducible GPU float32 benchmarks β AMD DirectML 40.2x speedup on RX 5700 XT. Windows/Linux/macOS. torch-directml, ROCm, CUDA, MPS. All results verified on real hardware.
Run JAX on AMD GPU β Windows DirectML + Linux/WSL2 ROCm. The missing setup guide for RX 5000/6000/7000 series. vmap, grad, jit tested on real hardware.
GPU vs CPU performance benchmarking for PyTorch and JAX. Works on AMD ROCm, DirectML, CUDA, MPS, CPU. Optimized for RX 5700 XT in WSL2.
Use Claude Code with free local AI models via Ollama β Qwen, DeepSeek, Mistral. The setup guide Anthropic won't write. Save \5-30/month.
Local LLM inference on AMD GPU β llama.cpp Vulkan on Windows, no ROCm required
Faster-Whisper on AMD GPUs via DirectML on Windows β drop-in GPU transcription, no ROCm required
Run ComfyUI on AMD GPU (RDNA1-RDNA4) on Windows -- comfyui-rocm, AMD Portable, and DirectML. Tested on RX 5700 XT.
ONNX Runtime + DirectML on AMD GPUs β GPU-accelerated inference on Windows, no CUDA, no ROCm
Stable Diffusion / SDXL on AMD GPUs via DirectML on Windows β no ROCm required
Everything you need to run AI/ML on AMD GPUs on Windows β master toolkit hub