Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).
A low-latency LRU approximation cache in C++ using CLOCK second-chance algorithm. Multi level cache too. Up to 2.5 billion lookups per second.
Simple load-balancing library for balancing GPGPU workloads between a GPU and a CPU or any number of devices in a computer or multiple computers.