Topic

benchmark

Repositories (1763)

ASR_benchmark
ASR_benchmark Franck-Dernoncourt Python

Program to benchmark various speech recognition APIs

81
evoeval
evoeval evo-eval Python

EvoEval: Evolving Coding Benchmarks via LLM

81
ruby-performance-tools
ruby-performance-tools JuanitoFatas

List of Ruby Tools for doing Performance.

81
indonlg
indonlg IndoNLP Python

The first-ever vast natural language generation benchmark for Indonesian, Sundanese, and Javanese. We provide multiple downstream tasks, pre-trained I...

80
cpu-micro-benchmarks
cpu-micro-benchmarks jiegec Assembly

CPU micro benchmarks

80
gobench
gobench gobench-io HTML

A benchmark framework based on Golang

80
arewefastyet
arewefastyet vitessio Go

Automated Benchmarking System for Vitess

80
DotNet-Collections-Benchmark
DotNet-Collections-Benchmark mjebrahimi C#

🚀 A comprehensive performance comparison benchmark between different .NET collections.

80
vector-db-benchmark
vector-db-benchmark myscale Python

Framework for benchmarking fully-managed vector databases

80
NeurIPS24-Terra
NeurIPS24-Terra CityMind-Lab Jupyter Notebook

[NeurIPS 2024] Terra: A Multimodal Spatio-Temporal Dataset Spanning the Earth

79
lhbench
lhbench lhbench Scala

Lakehouse storage system benchmark

79
sorry-bench
sorry-bench SORRY-Bench Jupyter Notebook

Benchmark evaluation code for "SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal" (ICLR 2025)

79
Human-Benchmark
Human-Benchmark PrintN Dart

Human Benchmark is a Flutter app for Android that features many tests to assess your abilities.

79
MEGA-Bench
MEGA-Bench TIGER-AI-Lab Python

This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]

79
php-di-container-benchmarks
php-di-container-benchmarks kocsismate PHP

Benchmark for some popular PHP Dependency Injection Containers.

78
SUES-200-Benchmark
SUES-200-Benchmark Reza-Zhu Python

SUES-200: A Multi-height Multi-scene Cross-view Image Benchmark Across Drone and Satellite

78
LruClockCache
LruClockCache tugrul512bit C++

A low-latency LRU approximation cache in C++ using CLOCK second-chance algorithm. Multi level cache too. Up to 2.5 billion lookups per second.

77
go-cache-benchmark
go-cache-benchmark vmihailenco Go

Cache benchmark for Golang

77
dataracebench
dataracebench llnl C

Data race benchmark suite for evaluating OpenMP correctness tools aimed to detect data races.

77
spiko
spiko trinhminhtriet Rust

🚀 Spiko is a fast, Rust-based load testing tool with a beautiful TUI for real-time insights.

77
rsb
rsb gamelife1314 Rust

a http server benchmark tool written in rust 🦀

77
WiMANS
WiMANS huangshk Python

[ECCV 2024] WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing

77
browserating
browserating kawaiier JavaScript

Compare performance of macOS browsers based on Speedometer 3.1

77
GraphOmni
GraphOmni GAI-Community Python

Enable Comprehensive LLM Evaluation on Graph Reasoning

77
python-pytest-harvest
python-pytest-harvest smarie Python

Store data created during your `pytest` tests execution, and retrieve it at the end of the session, e.g. for applicative benchmarking purposes.

76
RealPDEBench
RealPDEBench AI4Science-WestlakeU Python

[ICLR26 Oral] RealPDEBench: A Benchmark for Complex Physical Systems with Paired Real-World and Simulated Data

76
swt-bench
swt-bench logic-star-ai Python

[NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation

76
Advbench
Advbench thunlp Python

Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP".

76
LeakDB
LeakDB KIOS-Research Python

LeakDB (Leakage Diagnosis Benchmark) is a realistic leakage dataset for water distribution networks. The dataset is comprised of a large number of art...

76
benchable
benchable MatheusRich Ruby

Write benchmarks without the hassle.

75
heurigym
heurigym cornell-zhang Python

Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization (ICLR'26)

75
DocumentFileCompat
DocumentFileCompat ItzNotABug Kotlin

Blazing fast AndroidX DocumentFile alternative for Android SAF (scoped storage). Up to ~14x faster on large directories.

75
CMI
CMI zju-vipa Python

[IJCAI-2021] Contrastive Model Inversion for Data-Free Knowledge Distillation

75
ipc_benchmark
ipc_benchmark detailyang Python

IPC benchmark on Linux

75
MEDFAIR
MEDFAIR ys-zong Python

[ICLR 2023 spotlight] MEDFAIR: Benchmarking Fairness for Medical Imaging

74
the-cpp-abstraction-penalty
the-cpp-abstraction-penalty germandiagogomez C++

Modern C++ benchmarking

74
TLCBench
TLCBench tlc-pack Python

Benchmark scripts for TVM

74
Okutama-Action
Okutama-Action miquelmarti CSS

Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection

74
Turbo-Histogram
Turbo-Histogram powturbo C

Fastest Histogram Construction

74
IVEBench
IVEBench RyanChenYN Python

[ICLR 2026] IVEBench - Benchmark for Instruction-Guided Video Editing

74
llm-benchmark
llm-benchmark terryyz

A list of LLM benchmark frameworks.

74
Safe-Multi-Agent-Mujoco
Safe-Multi-Agent-Mujoco chauncygu Python

Safe Multi-Agent MuJoCo benchmark for safe multi-agent reinforcement learning research.

73
rpc-bench
rpc-bench bp-alex Java

RPC Benchmark of gRPC, Aeron and KryoNet

73
web-components-benchmark
web-components-benchmark vogloblinsky JavaScript

Web Components benchmark for a various Web Components technologies

72
BenchmarkFcns
BenchmarkFcns mazhar-ansari-ardeh C++

A Python and MATLAB implementation of mathematical test functions for benchmarking optimization algorithms.

72
Battery_mark_for_3DS
Battery_mark_for_3DS Core-2-Extreme C

Benchmark your 3DS battery

72
scalajs-benchmark
scalajs-benchmark japgolly Scala

Benchmarks: write in Scala or JS, run in your browser. Live demo:

72
ncnn-benchmark
ncnn-benchmark BUG1989 CMake

The benchmark of ncnn that is a high-performance neural network inference framework optimized for the mobile platform

72
One-shot-Human-Parsing
One-shot-Human-Parsing Charleshhy Python

[AAAI 2021] (oral) Progressive One-shot Human Parsing, [TPAMI 2023] End-to-end One-shot Human Parsing

72
TaskMeAnything
TaskMeAnything JieyuZ2 Python

[NeurIPS 2024] A task generation and model evaluation system for multimodal language models.

72