Most popular benchmark repositories and open source projects

optunahub-registry optuna Jupyter Notebook

The registry of the OptunaHub packages

52 57 52

OneEval XChen-Zero Python

OneEval: Open EvalScope evaluation artifacts for LLMs — subset breakdowns, pass@k curves, and reproducible evaluation protocols.

52 0 52

rtb processone Erlang

Benchmarking tool to stress real-time protocols

51 6 51

pspdfkit-webassembly-benchmark PSPDFKit-labs JavaScript

WebAssembly real-world performance benchmark — iswebassemblyfastyet.com

51 6 51

docker-examples cockroachlabs-field Shell

CockroachDB examples using Docker and Docker Compose

51 16 51

hyperspectral-soilmoisture-dataset felixriese Jupyter Notebook

Hyperspectral and soil-moisture data from a field campaign based on a soil sample. Karlsruhe (Germany), 2017.

51 13 51

OpenMEVA thu-coai Python

Benchmark for evaluating open-ended generation

51 7 51

nl2code-dataset aixcoder-plugin Java

Aix-bench, the Java benchmark for code synthesis problem.

51 1 51

Performance-Wars-Benchmarking-CSharp mjebrahimi C#

🔥Performance Wars Benchmarking C# - This repository contains a collection of C# benchmarks to compare the performance of different approaches to solv...

51 7 51

results embeddings-benchmark Python

Data for the MTEB leaderboard

51 146 51

MedAgentBoard yhzhu99 Python

[NeurIPS 2025] MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks

51 3 51

LakeBench microsoft Python

A multi-modal Python library for benchmarking lakehouse engines and ELT scenarios, supporting both industry-standard and novel benchmarks.

51 17 51

spatial_index_benchmark mloskot C++

Simple non-academic performance comparison of available open source implementations of R-tree spatial index using linear, quadratic and R* balancing a...

50 11 50

weather4cast iarai Jupyter Notebook

Code accompanying our IARAI Weather4cast Challenge

50 17 50

Slime-Mould-Algorithm-A-New-Method-for-Stochastic-Optimization- aliasgharheidaricom MATLAB

In this paper, a new stochastic optimizer, which is called slime mould algorithm (SMA), is proposed based upon the oscillation mode of slime mould in...

50 12 50

Frame-Time-Analysis BoringBoredom TypeScript

web application that charts and compares multiple frame time logs at the same time. Compatible with FPS benchmarking programs such as PresentMon, OCAT...

50 4 50

ReportBench ByteDance-BandAI Python

A comprehensive benchmark for evaluating deep research agents on academic survey tasks

50 4 50

SeePhys AI4Phys Python

[NeurIPS 2025] Official implementation for the paper "SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning"

50 0 50

MMFakeBench liuxuannan Python

[ICLR 2025] MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

50 0 50

docker-ab jig Dockerfile

Apache Benchmark Docker image

49 15 49

SQL-ProcBench microsoft TSQL

SQL-ProcBench is an open benchmark for procedural workloads in RDBMSs.

49 10 49

Unity-Pathfinding-Jobs-StressTest CristianQiu C#

Unity project showcasing A* pathfinding, fully jobified & burst compiled. It also contains examples of RaycastCommand and BoxcastCommand that are used...

49 9 49

sqlite-bench ukontainer C

SQLite Benchmark

49 27 49

FLBenchmark-toolkit AI-secure Python

Federated Learning Framework Benchmark (UniFed)

49 6 49

Horreum Hyperfoil Java

Benchmark results repository service

49 35 49

python-package-manager-shootout lincolnloop Makefile

Benchmarking various Python package managers

49 14 49

react-native-css-in-js-benchmarks brunolemos JavaScript

CSS in JS Benchmarks for React Native

48 11 48

scRNAseq_cell_cluster_labeling jdime Perl

Scripts to run and benchmark scRNA-seq cell cluster labeling methods

48 14 48

fizzboom darklang F#

Benchmark to compare async web server + interpreter + web client implementations across various languages

48 8 48

rust-zero-cost-abstractions mike-barber C#

Testing out a Zero Cost Abstraction in Rust compared to similar approaches in C# and Java

48 1 48

heatwave-tpch oracle

SQL scripts for HeatWave benchmarking

48 13 48

SeasonDepth SeasonDepth Python

This package provides a python toolkit for the evaluation on the "SeasonDepth: Cross-Season Monocular Depth Prediction Dataset and Benchmark under Mul...

48 5 48