Most popular benchmark repositories and open source projects

Youku-mPLUG X-PLUG Python

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

307 10 4

TSB-AD thedatumorg Python

Time-Series Anomaly Detection | Algorithms + Datasets + Tutorials

306 65 6

bark bark-simulator C++

Open-Source Framework for Development, Simulation and Benchmarking of Behavior Planning Algorithms for Autonomous Driving

306 72 14

CARLA carla-recourse Python

CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms

304 64 5

benchflow benchflow-ai Python

Research infra for creating RL environments, post-training, and evals

302 38 2

ecs_benchmark abeimler C++

Benchmarks of common ECS (Entity-Component-System)-Frameworks in C++ (or C)

301 19 4

HeCBench ORNL C++

300 117 4

elimination_game lechmazur

A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations,...

300 11 6

BLUE_Benchmark ncbi-nlp Python

BLUE benchmark consists of five different biomedicine text-mining tasks with ten corpora.

298 41 11

WorldScore haoyi-duan Python

Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation

298 20 3

gungraun gungraun Rust

High-precision, one-shot and consistent benchmarking framework/harness for Rust. All Valgrind tools at your fingertips.

298 24 2

rust-web-benchmarks programatik29 Rust

Benchmarking web frameworks written in rust with rewrk tool.

297 42 7

perfops-cli ProspectOne Go

A simple command line tool to interact with hundreds of servers around the world.

296 48 22

trajnetplusplusbaselines vita-epfl Python

[ITS'21] Human Trajectory Forecasting in Crowds: A Deep Learning Perspective

296 84 11

Face-Renovation Lotayou Python

Official repository of the paper "HiFaceGAN: Face Renovation via Collaborative Suppression and Replenishment".

292 49 14

dbtester etcd-io Go

Distributed database benchmark tester

291 46 21

awesome-diffusion-v2v wenhao728 Python

Awesome diffusion Video-to-Video (V2V). A collection of paper on diffusion model-based video editing, aka. video-to-video (V2V) translation. And a vid...

291 10 4

Turbo-Run-Length-Encoding powturbo C

TurboRLE-Fastest Run Length Encoding

290 29 12

picows tarasko Python

Ultra-fast websocket client and server for asyncio

290 18 4

CFDBench luo-yining Python

A large-scale benchmark for machine learning methods in fluid dynamics

290 39 6

DeepFund HKUSTDial Python

🔥[NeurIPS'25] DeepFund: Pilot for Your Next Fund Investment

288 48 4

STPLS3D meidachen Python

🔥 Synthetic and real-world 2d/3d dataset for semantic and instance segmentation (BMVC 2022 Oral)

288 23 9

benchexec sosy-lab Python

BenchExec: A Framework for Reliable Benchmarking and Resource Measurement

287 227 13

minebench Ammaar-Alam TypeScript

Minecraft-style voxel benchmark for comparing AI models (Arena + Sandbox)

286 21 2

java-object-mapper-benchmark arey Java

JMH benchmark of Java object-to-object mapping frameworks

285 54 15

pantheon StanfordSNR Python

Pantheon of Congestion Control

285 149 22

web-bench bytedance JavaScript

Web-Bench is a benchmark designed to evaluate the performance of LLMs in actual Web development.

285 29 5

RHEL7-STIG ansible-lockdown YAML

Automated STIG Benchmark Compliance Remediation for RHEL 7 with Ansible

282 146 33

elbencho breuner C++

A distributed storage benchmark for file systems, object stores & block devices with support for GPUs

282 42 15

Large-Scale-Medical Luffy03 Python

[TPAMI 2026] Large-Scale 3D Medical Image Pre-training with Geometric Context Priors

281 17 5

tf-metal-experiments tlkh Jupyter Notebook

TensorFlow Metal Backend on Apple Silicon Experiments (just for fun)

280 31 16

goTemplateBenchmark slinso Go

comparing the performance of different template engines

277 23 8

DS-1000 xlang-ai Python

[ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".

276 31 7

ProteinWorkshop a-r-j Python

Benchmarking framework for protein representation learning. Includes a large number of pre-training and downstream task datasets, models and training/...

275 22 6