You can find the most recent KGQA benchmark numbers from publications here.
TSB-AD: Towards A Reliable Time-Series Anomaly Detection Benchmark
[RA-L2022] V2X-Sim Dataset and Benchmark
Instruments for benchmarking, tracing, and debugging Factory Girl models.
The Unity Performance Benchmark tool enables partners and developers to establish benchmark samples and measurements using the Performance Testing pac...
🌍 PDDL instances covering the International Planning Competitions
Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.
benchmarking quantum circuit emulators for your daily research usage
The repository includes PyTorch code, and the data, to reproduce the results for our paper titled "A Machine Learning Benchmark for Facies Classificat...
A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!
A banchmark list for evaluation of large language models.
Video Copy Segment Localization (VCSL) dataset and benchmark [CVPR2022]
[TPAMI 2022] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, W...
Volley is a benchmarking tool for measuring the performance of server networking stacks.
A modular toolbox for meta-learning research with a focus on speed and reproducibility.
FBPro Audit Test Automation Package allows you to create compliance reports for your systems. The resulting HTML-reports provide a transparent overvie...
Python-based portfolio / stock widget which sources data from Yahoo Finance and calculates different types of Value-at-Risk (VaR) metrics and many oth...
Java Virtual Machine (JVM) Performance Benchmarks with a primary focus on top-tier Just-In-Time (JIT) Compilers, such as C2 JIT, Graal JIT, and the Fa...
MVP Benchmark for Multi-View Partial Point Cloud Completion and Registration
A benchmark that challenges language models to code solutions for scientific problems
CORe50: a new Dataset and Benchmark for Continual Learning
benchmark dataset and Deep learning method (Hierarchical Interaction Network, HINT) for clinical trial approval probability prediction, published in C...
The official implementation of the ACM MM'21 paper Co-learning: Learning from noisy labels with self-supervision.
Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of prote...
Benchmark CS:GO on any map
Meter - is a simple micro-benchmarking tool for Android (and Java) projects. This is not a profiler, this is very small utility class that designed fo...
CPU Ultimate Latency Test.
[NeurIPS 2024] Touchstone - Benchmarking AI on 5,172 o.o.d. CT volumes and 9 anatomical structures
Simple HTTP benchmark for different nodejs frameworks using wrk
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLL...
measure startup time of your react-native app
The evaluation benchmark on MCP servers
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
[ICLR 2025] Benchmarking Agentic Workflow Generation
A Contamination-free Multi-task Language Understanding Benchmark [Official, ACL 2025]
Simple comparison of code execution speed between different options
Sample benchmark files for Hyperledger Caliper https://wiki.hyperledger.org/display/caliper
Test/benchmark regression and comparison system with dashboard
[NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
A benchmark for prompt injection detection systems.
MTAD: Tools and Benchmark for Multivariate Time Series Anomaly Detection
A collection of RISC-V Vector (RVV) benchmarks to help developers write portably performant RVV code
Playing around "Less Slow" coding practices in Python, from numerical micro-kernels to coroutines, ranges, and polymorphic state machines
A tool for benchmarking the render performance of React components
comparing the execution speeds of various programming languages
Playing around "Less Slow" coding practices in Rust, from numerical micro-kernels to coroutines, ranges, and polymorphic state machines
benchyou is a benchmark tool for MySQL, real-time monitoring TPS and vmstat/iostat
A hello world benchmark for the available Rust Web Frameworks: hyper vs gotham vs actix-web vs warp vs rocket
[ICRA2021] A unified benchmark for the evaluation of mobile robot local planning approaches
Shuhai is a benchmarking-memory tool that allows FPGA programmers to demystify all the underlying details of memories, e.g., HBM and DDR4, on a Xilinx...