Most popular benchmark repositories and open source projects

coir CoIR-team Python

(ACL 2025 Main) A Comprehensive Benchmark for Code Information Retrieval.

151 15 151

plf_nanotimer mattreecebentley C++

A simple C++ 03/11/etc timer class for ~microsecond-precision cross-platform benchmarking. The implementation is as limited and as simple as possible...

151 14 151

NAS-Benchmark antoyang Python

[ICLR 2020] NAS evaluation is frustratingly hard

150 24 150

EasyIterator TheLartians C++

🏃 Iterators made easy! Zero cost abstractions for designing and using C++ iterators.

150 9 6

pddl-instances potassco Common Lisp

🌍 PDDL instances covering the International Planning Competitions

150 61 150

serverless-faas-workbench ddps-lab Python

FunctionBench : A Suite of Workloads for Serverless Cloud Function Service

148 48 4

Windows-2019-CIS ansible-lockdown YAML

Automated CIS Benchmark Compliance Remediation for Windows Server 2019 with Ansible

148 78 148

rvv-bench camel-cdr Assembly

A collection of RISC-V Vector (RVV) benchmarks to help developers write portably performant RVV code

148 36 148

mqperf softwaremill Scala

147 36 14

bucketbench estesp Go

Go-based framework for running benchmarks against Docker, containerd, runc, or any CRI-compliant runtime

147 38 9

xVerify IAAR-Shanghai Jupyter Notebook

xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

147 7 147

aurora wenhaochai Python

[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

147 6 2

CharXiv princeton-nlp Python

[NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

147 16 147

aacr-bench alibaba Python

An Alibaba open-source multi-language benchmark for evaluating LLMs in repository-level automatic code review, featuring an AI-assisted and expert-ver...

147 10 147

ClassEval FudanSELab Python

Benchmark ClassEval for class-level code generation.

146 16 146

gameworld gameworld-project Python

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

146 5 146

goku jcaromiq Rust

Goku is an HTTP load testing application written in Rust

146 6 2

docile rossumai Python

DocILE: Document Information Localization and Extraction Benchmark

146 12 146

golang-benchmarks SimonWaldherr Go

Go(lang) benchmarks - (measure the speed of golang)

145 19 145

space_robotics_bench AndrejOrsula Python

Robot Learning Beyond Earth

145 20 145

TCPDBench alan-turing-institute

The Turing Change Point Detection Benchmark: An Extensive Benchmark Evaluation of Change Point Detection Algorithms on real-world data

144 31 144

php-orm-benchmark kenjis PHP

PHP ORM Benchmark

143 14 10

benchmarks lmdbjava Shell

Benchmark of open source, embedded, memory-mapped, key-value stores available from Java (JMH)

143 22 143

video-quality-metrics CrypticSignal Python

Uses FFmpeg to benchmark video encoders to compare VMAF, SSIM and PSNR with different encoder settings.

143 21 5

leaderboard KGQA Jupyter Notebook

You can find the most recent KGQA benchmark numbers from publications here.

143 18 143

jsbench-me psiho

jsbench.me - JavaScript performance benchmarking playground

142 2 142

smartbugs-curated smartbugs Solidity

SB Curated is a curated dataset of Solidity smart contracts annotated with tagged vulnerabilities. The dataset was created to evaluate the accuracy of...

142 36 142

V2X-Sim ai4ce

[RA-L2022] V2X-Sim Dataset and Benchmark

142 18 142

SpeedTests jabbalaci Python

comparing the execution speeds of various programming languages

142 43 142

service-mesh-benchmark kinvolk Shell

141 36 141

ElegantMustard lscambo13

An elegant RTSS Overlay to showcase your benchmark stats in style.

139 0 139

EmoBench-M Emo-gml Python

EmoBench-M: A benchmark for evaluating Emotional Intelligence in Multimodal Large Language Models.

138 13 138

Video-Bench PKU-YuanGroup Python

A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!

138 3 138

facies_classification_benchmark yalaudah Python

The repository includes PyTorch code, and the data, to reproduce the results for our paper titled "A Machine Learning Benchmark for Facies Classificat...

138 65 138

typescript-orm-benchmark emanuelcasco TypeScript

⚖️ ORM benchmarking for Node.js applications written in TypeScript

137 15 137

VCSL alipay Python

Video Copy Segment Localization (VCSL) dataset and benchmark [CVPR2022]

137 19 137

go-perftuner go-perf Go

Helper tool for manual Go code optimization.

137 5 3

Touchstone MrGiovanni Jupyter Notebook

[NeurIPS 2024] Touchstone - Benchmarking AI on 5,172 o.o.d. CT volumes and 9 anatomical structures

136 4 136

chembench lamalab-org Python

How good are LLMs at chemistry?

136 16 136

arewefastyet mozilla JavaScript

NOT MAINTAINED ANYMORE! New project is located on https://github.com/mozilla-frontend-infra/js-perf-dashboard -- AreWeFastYet is a set of tools used f...

135 49 2

PersonaMem bowen-upenn Python

[COLM 2025] Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale

135 8 135

Domain-generalization-fault-diagnosis-benchmark CHAOZHAO-1 Python

This is a benckmark for domain generalization-based fault diagnosis （基于领域泛化的相关代码）

135 12 135

PHP-Frameworks-Bench myaaghubi PHP

A library to make benchmarks from PHP frameworks.

135 5 8

awesome-world-model-evolution OpenRaiser

A curated collection of research papers, models, and resources tracing the evolution from specialized models to unified world models.

134 11 134

xbench-evals xbench-ai Python

Evergreen, contamination-free, real-world, domain-specific AI evaluation framework

134 7 134

actors plokhotnyuk Scala

Evaluation of API and performance of different actor libraries

132 15 13

golang-graphql-benchmark appleboy Go

benchmark of golang GraphQL framework.

132 12 132

jvm-performance-benchmarks ionutbalosin Java

Java Virtual Machine (JVM) Performance Benchmarks with a primary focus on top-tier Just-In-Time (JIT) Compilers, such as C2 JIT, Graal JIT, and the Fa...

132 15 5

THST tuxalin C++

Templated hierarchical spatial trees designed for high-peformance.

132 18 132

contender flashbots Rust

spam EVM execution nodes over JSON-RPC & run benchmarks

132 49 132

benchmark

Repositories (1763)