Most popular benchmark repositories and open source projects

BALROG

Benchmarking Agentic LLM and VLM Reasoning On Games

28   147   147  

MMTrustEval

A toolbox for benchmarking trustworthiness of multimodal large languag...

10   146   146  

bucketbench

Go-based framework for running benchmarks against Docker, containerd,...

38   146   146  

math-parser-benchmark-project

C++ Mathematical Expression Parser Benchmark

29   145   145  

7guis

7GUIs is a GUI programming usability benchmark.

18   145   145  

VPR-methods-evaluation

Easily download and evaluate pre-trained Visual Place Recognition meth...

16   144   144  

bsuccinct-rs

Rust libraries and programs focused on succinct data structures

10   144   144  

ossf-cve-benchmark

The OpenSSF CVE Benchmark consists of code and metadata for over 200 r...

38   144   144  

ecs

Build your own Game-Engine based on the Entity Component System concep...

11   144   144  

php-orm-benchmark

PHP ORM Benchmark

14   143   143  

EasyIterator

๐Ÿƒ Iterators made easy! Zero cost abstractions for designing and using...

8   143   143  

jsbench-me

jsbench.me - JavaScript performance benchmarking playground

2   143   143  

compiler-benchmark

Benchmarks compilation speeds of different combinations of languages a...

18   143   143  

ClassEval

Benchmark ClassEval for class-level code generation.

15   143   143  

MMToM-QA

[๐Ÿ†Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of...

18   143   143  

memory-maze

Evaluating long-term memory of reinforcement learning algorithms

16   142   142  

qpbenchmark

Benchmark for quadratic programming solvers available in Python

12   141   141  

iai-callgrind

High-precision and consistent benchmarking framework/harness for Rust

16   141   141  

benchmarks

Benchmark of open source, embedded, memory-mapped, key-value stores av...

22   141   141  

plf_nanotimer

A simple C++ 03/11/etc timer class for ~microsecond-precision cross-pl...

14   141   141  

TCPDBench

The Turing Change Point Detection Benchmark: An Extensive Benchmark Ev...

31   141   141  

service-mesh-benchmark

35   139   139  

deepchange

ICCV 2023, project page of the paper "DeepChange: A Long-term Person R...

5   139   139  

Windows-2019-CIS

Automated CIS Benchmark Compliance Remediation for Windows Server 2019...

76   139   139  

web-bench

Web-Bench is a benchmark designed to evaluate the performance of LLMs...

11   139   139  

VPR-datasets-downloader

Automatic download VPR datasets in a standard format

18   138   138  

wake-word-benchmark

wake word engine benchmark framework

28   137   137  

golang-benchmarks

Go(lang) benchmarks - (measure the speed of golang)

18   136   136  

goku

Goku is an HTTP load testing application written in Rust

5   136   136  

arewefastyet

NOT MAINTAINED ANYMORE! New project is located on https://github.com/m...

49   135   135  

typescript-orm-benchmark

โš–๏ธ ORM benchmarking for Node.js applications written in TypeScript

15   135   135  

serverless-faas-workbench

FunctionBench : A Suite of Workloads for Serverless Cloud Function Ser...

44   134   134  

actors

Evaluation of API and performance of different actor libraries

15   133   133  

JSONBench

JSONBench: a Benchmark For Data Analytics On JSON

13   133   133  

BRIGHT

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive...

15   133   133  

zBench

๐Ÿ“Š zig benchmark

11   132   132  

Shot2Story

A new multi-shot video understanding benchmark Shot2Story with compreh...

7   132   132  

TeaStore

A micro-service reference test application for model extraction, cloud...

156   132   132  

video-quality-metrics

Uses FFmpeg to benchmark video encoders to compare VMAF, SSIM and PSNR...

20   131   131  

go-perftuner

Helper tool for manual Go code optimization.

4   130   130  

golang-graphql-benchmark

benchmark of golang GraphQL framework.

12   130   130  

THST

Templated hierarchical spatial trees designed for high-peformance.

18   129   129  

docile

DocILE: Document Information Localization and Extraction Benchmark

9   129   129  

bench-node

A powerful Node.js benchmark library

8   129   129  

leaderboard

You can find the most recent KGQA benchmark numbers from publications...

18   128   128  

srs-benchmark

A benchmark for spaced repetition schedulers/algorithms

14   128   128  

V2X-Sim

[RA-L2022] V2X-Sim Dataset and Benchmark

14   127   127  

TSB-AD

TSB-AD: Towards A Reliable Time-Series Anomaly Detection Benchmark

28   127   127  

factory_bot_instruments

Instruments for benchmarking, tracing, and debugging Factory Girl mode...

10   126   126  

pddl-instances

๐ŸŒ PDDL instances covering the International Planning Competitions

57   126   126