Build and run Docker containers leveraging NVIDIA GPUs
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
NeMo: a toolkit for conversational AI
Synthesizing and manipulating 2048x1024 images with conditional GANs
A Python framework for GPU-accelerated simulation, robotics, and machine learning.
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
Optimized primitives for collective multi-GPU communication
Deep Learning GPU Training System
NVIDIA device plugin for Kubernetes
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
CUDA Core Compute Libraries
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
AIStore: scalable storage for AI applications
Deep learning for recommender systems
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows
Fast and accurate object detection with end-to-end GPU optimization
CUDA Kernel Benchmarking Library
GPU accelerated decision optimization
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).