Most popular benchmark repositories and open source projects

kubernetes-iperf3

Simple wrapper around iperf3 to measure network bandwidth from all nod...

39   108   108  

marbert

UBC ARBERT and MARBERT Deep Bidirectional Transformers for Arabic

15   108   108  

dbbench

🏋️ dbbench is a simple database benchmarking tool which supports sever...

20   107   107  

globalping-probe

The globalping probe code that runs on your hardware and connects to t...

27   107   107  

xVerify

xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

7   107   107  

WorldScore

Official implementation for WorldScore: A Unified Evaluation Benchmark...

6   106   106  

kaggle-dogs-vs-cats-caffe

Kaggle dogs vs cats solution in Caffe

60   106   106  

EvalNE

Source code for EvalNE, a Python library for evaluating Network Embedd...

26   106   106  

video_object_detection_paper

update some video object detection papers (视频目标检测论文和代码整理...

7   105   105  

gpumembench

A GPU benchmark suite for assessing on-chip GPU memory bandwidth

26   105   105  

ORBIT-Dataset

The ORBIT dataset is a collection of videos of objects in clean and cl...

23   105   105  

solidity-benchmarks

Benchmarks of popular contract implementations in solidity

11   105   105  

peaks-consolidation

The Peaks Consolidation is equipped with state-of-the-art algorithms a...

8   105   105  

OpenS2V-Nexus

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subj...

0   105   105  

deepmark

Deepmark AI enables a unique testing environment for language models (...

2   104   104  

tastylib

C++ implementations of data structures, algorithms, and system designs...

29   104   104  

XRAutomatedTests

XRAutomatedTests is where you can find functional, graphics, performan...

19   104   104  

pytorch-benchmark

Easily benchmark PyTorch model FLOPs, latency, throughput, allocated g...

11   103   103  

RHEL8-STIG

Automated STIG Benchmark Compliance Remediation for RHEL 8 with Ansibl...

61   103   103  

dm_nevis

NEVIS'22: Benchmarking the next generation of never-ending learners

6   102   102  

annbench

A lightweight benchmark for approximate nearest neighbor search

16   102   102  

playwright-test

Run unit tests with several test runners or benchmark inside real brow...

13   102   102  

mini-nbody

A simple gravitational N-body simulation in less than 100 lines of C c...

27   101   101  

benchmark-websocket

Websocket Client and Server for benchmarks with Millions of concurrent...

15   101   101  

zk-Harness

Benchmarking framework for general purpose zero-knowledge proofs langu...

21   101   101  

NoLiMa

Official repository for "NoLiMa: Long-Context Evaluation Beyond Litera...

5   101   101  

endless-memory-gym

Challenging Memory-based Deep Reinforcement Learning Agents

2   100   100  

FollowBench

[ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Followi...

15   100   100  

pplbench

Evaluation Framework for Probabilistic Programming Languages

22   100   100  

mqtt-mock

mqtt压测工具。支持subscribe、publish压测方式,支持模拟客户端连接数。

34   100   100  

tpch-spark

TPC-H queries in Apache Spark SQL using native DataFrames API

81   99   99  

datacenter-speed-tests

⚡ Test speed and pings to all DigitalOcean, Linode, AWS, GCP, and Vul...

11   99   99  

smartbugs-curated

SB Curated is a curated dataset of Solidity smart contracts annotated...

25   99   99  

php-arrays-in-memory-comparison

How to store 11kk items in memory? Comparison of methods: array vs obj...

4   98   98  

best

:trophy: Delightful Benchmarking & Performance Testing

9   97   97  

yjit-bench

Set of benchmarks for the YJIT CRuby JIT compiler and other Ruby imple...

25   97   97  

PPM

A High-Quality Photograpy Portrait Matting Benchmark

11   97   97  

ping_pong_bench

A benchmark for role-playing language models

7   97   97  

MMC

[NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM In...

4   97   97  

VisualNews-Repository

[EMNLP'21] Visual News: Benchmark and Challenges in News Image Caption...

9   96   96  

unsafe

Assorted java classes that make use of sun.misc.Unsafe

30   96   96  

pglib-uc

Benchmarks for the Unit Commitment Problem

33   95   95  

coir

(ACL 2025 Main) A Comprehensive Benchmark for Code Information Retrie...

9   95   95  

mPLUG-HalOwl

mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating

2   95   95  

DSRL

🔥 Datasets and env wrappers for offline safe reinforcement learning

6   95   95  

WorfBench

[ICLR 2025] Benchmarking Agentic Workflow Generation

4   94   94  

GLUE-X

We leverage 14 datasets as OOD test data and conduct evaluations on 8...

2   93   93  

nyt-connections

Benchmark that evaluates LLMs using 651 NYT Connections puzzles extend...

5   93   93  

PAD

[NeurlPS 2023] A Dataset and Benchmark for Pose-agnostic Anomaly Detec...

7   93   93  

LaBench

Latency Benchmarking tool

16   93   93