Cross-architecture parallel algorithms for Julia's CPU and GPU backends. Targets multithreaded CPUs, and GPUs via Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.
Reusable compiler infrastructure for Julia GPU backends.
CPU/GPU portable array, parallel_for/parallel_reduce in Julia for productive science. Funded by the US DOE Advanced Scientific Computing Research (ASCR).