Official Code for What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks (In NeurIPS 2023)
RTSS / RivaTuner Overlay
Build your own Game-Engine based on the Entity Component System concept in Golang.
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
Automatic download VPR datasets in a standard format
Benchmarking State-of-the-Art Deep Learning Software Tools
Evaluating long-term memory of reinforcement learning algorithms
[CVPRW 2022] Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets
Transfer Learning Shootout for PyTorch's model zoo (torchvision)
performance benchmark infrastructure for IPLD DAGs
A Business-Driven Real-World Financial Benchmark for Evaluating LLMs
Collection of hyperparameter optimization benchmark problems
FBPro Audit Test Automation Package allows you to create compliance reports for your systems. The resulting HTML-reports provide a transparent overvie...
A banchmark list for evaluation of large language models.
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12...
choosing FFT library...
Benchmarking and Analyzing Point Cloud Perception Robustness under Corruptions
LLM Reasoning and Generation Benchmark. Evaluate LLMs in complex scenarios systematically.
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
[NeurIPS 2025 Spotlight] Scaling Computer-Use Grounding via UI Decomposition and Synthesis
Quick and dirty backup tool benchmark with reproducible results
The OpenSSF CVE Benchmark consists of code and metadata for over 200 real life CVEs, as well as tooling to analyze the vulnerable codebases using a va...
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles.
Ollama based Benchmark with detail I/O token per second. Python with Deepseek R1 example.
Benchmark any tool from the CLI
[🏆Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of Mind Question Answering
The Lodash for GenAI: Real Value + Consistent + Model-Agnostic
Benchmarks compilation speeds of different combinations of languages and compilers.
Rust libraries and programs focused on succinct data structures
A micro/macro benchmark framework for the Python programming language that helps with optimizing your software.
Fully-featured benchmark driver for Ruby
The jieba-analysis tool for java.(基于结巴分词词库实现的更加灵活优雅易用,高性能的 java 分词实现。支持词性标注。)
CodSpeed is the all-in-one performance testing toolkit. Optimize code performance and catch regressions early.
benchmark dataset and Deep learning method (Hierarchical Interaction Network, HINT) for clinical trial approval probability prediction, published in C...
[ICML 2025] MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
7GUIs is a GUI programming usability benchmark.
A gatling plugin for running load tests on Apache Dubbo(https://github.com/apache/incubator-dubbo) and other java ecosystem.
wake word engine benchmark framework
A TCP/UDP load generator that provides fine-grained, flow-level control in Go.
Server Info & Check Kit
C++ benchmark tool. Practical, stable and fast performance testing framework.
A codebase for point cloud scene flow estimation research. Latest works: TeFlow(CVPR'26), DeltaFlow(NeurIPS'25), HiMo(T-RO'25), VoteFlow(CVPR'25), Flo...
ICCV 2023, project page of the paper "DeepChange: A Long-term Person Re-identification Benchmark"
(ACL 2025 Main) A Comprehensive Benchmark for Code Information Retrieval.
C++ Mathematical Expression Parser Benchmark
A micro-service reference test application for model extraction, cloud management, energy efficiency, power prediction, single- and multi-tier auto-sc...
A simple C++ 03/11/etc timer class for ~microsecond-precision cross-platform benchmarking. The implementation is as limited and as simple as possible...
[ICLR 2020] NAS evaluation is frustratingly hard
🌍 PDDL instances covering the International Planning Competitions