Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP".
A codebase for point cloud scene flow estimation research. Latest works: Flow4D(RA-L'25), SSF(ICRA'25), SeFlow(ECCV'24), DeFlow(ICRA'24)
Mutual information estimators and benchmark
Apache Benchmark Docker image
Source for the PSPDFKit WebAssembly Benchmark: http://iswebassemblyfastyet.com
Dataset and Evaluation Scripts for Obstacle Detection via Semantic Segmentation in a Marine Environment
Federated Learning Framework Benchmark (UniFed)
[NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation
[ICLR'25] DataGen: Unified Synthetic Dataset Generation via Large Language Models
Benchmark to compare async web server + interpreter + web client implementations across various languages
Testing out a Zero Cost Abstraction in Rust compared to similar approaches in C# and Java
Code accompanying our IARAI Weather4cast Challenge
SQL scripts for HeatWave benchmarking
In this paper, a new stochastic optimizer, which is called slime mould algorithm (SMA), is proposed based upon the oscillation mode of slime mould in...
This package provides a python toolkit for the evaluation on the "SeasonDepth: Cross-Season Monocular Depth Prediction Dataset and Benchmark under Mul...
:zap: Optimizing Python code by implementing a C++ extension
Vision Benchmark in Rome Development Kit
[MedIA 2025] MedIAnomaly: A comparative study of anomaly detection in medical images
A benchmark fault diagnosis dataset comprises vibration data collected from a gearbox under variable working conditions with intentionally induced fau...
[ECCV 2024] WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing
[ASE'24][WWW'25] RCAEval: A Benchmark for Root Cause Analysis. https://doi.org/10.1145/3691620.3695065
A package for benchmarking synthetic relational data generation methods
CSS in JS Benchmarks for React Native
Scripts to run and benchmark scRNA-seq cell cluster labeling methods
Simple non-academic performance comparison of available open source implementations of R-tree spatial index using linear, quadratic and R* balancing a...
Hyperspectral and soil-moisture data from a field campaign based on a soil sample. Karlsruhe (Germany), 2017.
:zap: Benchmark framework for NodeJS
Unity project showcasing A* pathfinding, fully jobified & burst compiled. It also contains examples of RaycastCommand and BoxcastCommand that are used...
Official implementation of PRUDEX-Compass
Benchmarking various Python package managers
A simple PHP script that helps you compare raw performance across servers and php versions
🚀 SWE-bench Goes Live!
Barcelona OpenMP Task Suite is a collection of applications that allow to test OpenMP tasking implementations and compare its behaviour under certain...
Advanced Reasoning Benchmark Dataset for LLMs
ToMBench: Benchmarking Theory of Mind in Large Language Models, ACL 2024.
An benchmark for evaluating the capabilities of large vision-language models (LVLMs)
VPS Speedtest for WordPress with 160 results: 🏆 UpCloud (raw memory and CPU benchmark)
Cloud WorkBench (CWB) is a web-based framework that is grounded on the notion of Infrastructure-as-Code (IaC) to foster simple definition, execution,...
SQL-ProcBench is an open benchmark for procedural workloads in RDBMSs.
KcBERT/KcELECTRA Fine Tune Benchmarks code (forked from https://github.com/monologg/KoELECTRA/tree/master/finetune)
Web Fuzzing Dataset (WFD): a set of web/enterprise applications for experimentation in automated system testing
web application that charts and compares multiple frame time logs at the same time. Compatible with FPS benchmarking programs such as PresentMon, OCAT...
🚀 Spiko is a fast, Rust-based load testing tool with a beautiful TUI for real-time insights.
[NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models
Benchmarks for the RediSearch module
Provides a set of nodes to enable an extendable design pattern for flows.
Marketing Mix Modeling Data Generator
Multi-dataset stance detection and robustness experiments