Most popular benchmark repositories and open source projects

DSRL liuzuxin Python

🔥 Datasets and env wrappers for offline safe reinforcement learning

131 7 131

MVP_Benchmark paul007pl Python

MVP Benchmark for Multi-View Partial Point Cloud Completion and Registration

130 12 130

quantum-benchmarks yardstiq OpenQASM

benchmarking quantum circuit emulators for your daily research usage

128 27 128

PHP-Frameworks-Bench myaaghubi PHP

A library to make benchmarks from PHP frameworks.

128 7 128

TAB decisionintelligence Jupyter Notebook

[PVLDB 2025] TAB: Unified Benchmarking of Time Series Anomaly Detection Methods

128 14 128

factory_bot_instruments shiroyasha Ruby

Instruments for benchmarking, tracing, and debugging Factory Girl models.

127 10 127

FinTSB TongjiFinLab Python

FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting (ICAIF'25 Workshop Best Paper)

127 20 127

volley jonhoo C

Volley is a benchmarking tool for measuring the performance of server networking stacks.

127 11 127

RCAEval phamquiluan Jupyter Notebook

[FSE'26, WWW'25, ASE'24] RCAEval: A Benchmark for Root Cause Analysis.

127 30 127

optimum-transformers AlekseyKorshuk Python

Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.

126 8 126

HaluMem MemTensor Python

HaluMem is the first operation level hallucination evaluation benchmark tailored to agent memory systems.

126 14 126

meta-blocks alshedivat Python

A modular toolbox for meta-learning research with a focus on speed and reproducibility.

125 9 125

Deep_GCN_Benchmarking VITA-Group Python

[TPAMI 2022] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, W...

125 22 125

PerformanceBenchmarkReporter Unity-Technologies JavaScript

The Unity Performance Benchmark tool enables partners and developers to establish benchmark samples and measurements using the Performance Testing pac...

125 21 125

MM-NIAH OpenGVLab Python

[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLL...

125 6 125

MMLU-CF microsoft

A Contamination-free Multi-task Language Understanding Benchmark [Official, ACL 2025]

125 3 125

less_slow.py ashvardanian Python

Playing around "Less Slow" coding practices in Python, from numerical micro-kernels to coroutines, ranges, and polymorphic state machines

125 11 125

local-planning-benchmark NKU-MobFly-Robotics C++

[ICRA2021] A unified benchmark for the evaluation of mobile robot local planning approaches

124 30 124

awesome-human-activity-recognition Leooo-Huang Python

Always up-to-date, most comprehensive HAR resource — continuously scanned and auto-updated from Papers with Code. 53 datasets integrated across all mo...

124 2 124

less_slow.rs ashvardanian Rust

Playing around "Less Slow" coding practices in Rust, from numerical micro-kernels to coroutines, ranges, and polymorphic state machines

124 6 124

Python_Portfolio__VaR_Tool MBKraus Python

Python-based portfolio / stock widget which sources data from Yahoo Finance and calculates different types of Value-at-Risk (VaR) metrics and many oth...

123 40 123

core50 vlomonaco Python

CORe50: a new Dataset and Benchmark for Continual Learning

123 26 123

pglib-uc power-grid-lib Python

Benchmarks for the Unit Commitment Problem

123 36 123

AMO-Bench meituan-longcat Python

This is the official repo for the paper "AMO-Bench: Large Language Models Still Struggle in High School Math Competitions".

122 2 122

Co-learning chengtan9907 Python

The official implementation of the ACM MM'21 paper Co-learning: Learning from noisy labels with self-supervision.

121 13 121

tape-neurips2019 songlab-cal Python

Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of prote...

121 34 121

MTKD-CD circleLZY Python

Official implementation for "JL1-CD: A New Benchmark for Remote Sensing Change Detection and a Robust Multi-Teacher Knowledge Distillation Framework"

121 19 121

Awesome-Large-Recommendation-Models USTC-StarTeam

🔥🔥🔥 Latest Advances on Large Recommendation Models

121 0 121

RHEL8-STIG ansible-lockdown YAML

Automated STIG Benchmark Compliance Remediation for RHEL 8 with Ansible

120 68 120

dana google JavaScript

Test/benchmark regression and comparison system with dashboard

120 22 120

UniPercept thunderbolt215 Python

UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture

120 1 120

meter OleksandrKucherenko Java

Meter - is a simple micro-benchmarking tool for Android (and Java) projects. This is not a profiler, this is very small utility class that designed fo...

120 6 120

FollowBench YJiangcm Python

[ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models

119 18 119

benchmark TheDragonCode PHP

Simple comparison of code execution speed between different options

119 3 119

cult asmjit C++

CPU Ultimate Latency Test.

119 17 119

RiOSWorld yjyddq HTML

[NeurIPS 2025] Official repository of RiOSWorld: Benchmarking the Risk of Multimodal Computer-Use Agents

118 8 118

ping_pong_bench IlyaGusev Python

A benchmark for role-playing language models

117 10 117

marbert UBC-NLP

UBC ARBERT and MARBERT Deep Bidirectional Transformers for Arabic

117 17 117

node-frameworks-benchmark hbakhtiyor JavaScript

Simple HTTP benchmark for different nodejs frameworks using wrk

117 24 117

Shuhai RC4ML SystemVerilog

Shuhai is a benchmarking-memory tool that allows FPGA programmers to demystify all the underlying details of memories, e.g., HBM and DDR4, on a Xilinx...

117 24 117