Program to benchmark various speech recognition APIs
EvoEval: Evolving Coding Benchmarks via LLM
List of Ruby Tools for doing Performance.
The first-ever vast natural language generation benchmark for Indonesian, Sundanese, and Javanese. We provide multiple downstream tasks, pre-trained I...
CPU micro benchmarks
A benchmark framework based on Golang
Automated Benchmarking System for Vitess
🚀 A comprehensive performance comparison benchmark between different .NET collections.
Framework for benchmarking fully-managed vector databases
[NeurIPS 2024] Terra: A Multimodal Spatio-Temporal Dataset Spanning the Earth
Lakehouse storage system benchmark
Benchmark evaluation code for "SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal" (ICLR 2025)
Human Benchmark is a Flutter app for Android that features many tests to assess your abilities.
This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]
Benchmark for some popular PHP Dependency Injection Containers.
SUES-200: A Multi-height Multi-scene Cross-view Image Benchmark Across Drone and Satellite
A low-latency LRU approximation cache in C++ using CLOCK second-chance algorithm. Multi level cache too. Up to 2.5 billion lookups per second.
Cache benchmark for Golang
Data race benchmark suite for evaluating OpenMP correctness tools aimed to detect data races.
🚀 Spiko is a fast, Rust-based load testing tool with a beautiful TUI for real-time insights.
a http server benchmark tool written in rust 🦀
[ECCV 2024] WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing
Compare performance of macOS browsers based on Speedometer 3.1
Enable Comprehensive LLM Evaluation on Graph Reasoning
Store data created during your `pytest` tests execution, and retrieve it at the end of the session, e.g. for applicative benchmarking purposes.
[ICLR26 Oral] RealPDEBench: A Benchmark for Complex Physical Systems with Paired Real-World and Simulated Data
[NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation
Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP".
LeakDB (Leakage Diagnosis Benchmark) is a realistic leakage dataset for water distribution networks. The dataset is comprised of a large number of art...
Write benchmarks without the hassle.
Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization (ICLR'26)
Blazing fast AndroidX DocumentFile alternative for Android SAF (scoped storage). Up to ~14x faster on large directories.
[IJCAI-2021] Contrastive Model Inversion for Data-Free Knowledge Distillation
IPC benchmark on Linux
[ICLR 2023 spotlight] MEDFAIR: Benchmarking Fairness for Medical Imaging
Modern C++ benchmarking
Benchmark scripts for TVM
Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection
Fastest Histogram Construction
[ICLR 2026] IVEBench - Benchmark for Instruction-Guided Video Editing
A list of LLM benchmark frameworks.
Safe Multi-Agent MuJoCo benchmark for safe multi-agent reinforcement learning research.
RPC Benchmark of gRPC, Aeron and KryoNet
Web Components benchmark for a various Web Components technologies
A Python and MATLAB implementation of mathematical test functions for benchmarking optimization algorithms.
Benchmark your 3DS battery
Benchmarks: write in Scala or JS, run in your browser. Live demo:
The benchmark of ncnn that is a high-performance neural network inference framework optimized for the mobile platform
[AAAI 2021] (oral) Progressive One-shot Human Parsing, [TPAMI 2023] End-to-end One-shot Human Parsing
[NeurIPS 2024] A task generation and model evaluation system for multimodal language models.