Topic

benchmark

Repositories (1623)

Advbench
Advbench thunlp Python

Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP".

50
OpenSceneFlow
OpenSceneFlow KTH-RPL Python

A codebase for point cloud scene flow estimation research. Latest works: Flow4D(RA-L'25), SSF(ICRA'25), SeFlow(ECCV'24), DeFlow(ICRA'24)

50
bmi
bmi cbg-ethz Python

Mutual information estimators and benchmark

50
docker-ab
docker-ab jig Dockerfile

Apache Benchmark Docker image

49
pspdfkit-webassembly-benchmark
pspdfkit-webassembly-benchmark PSPDFKit-labs JavaScript

Source for the PSPDFKit WebAssembly Benchmark: http://iswebassemblyfastyet.com

49
modd
modd bborja MATLAB

Dataset and Evaluation Scripts for Obstacle Detection via Semantic Segmentation in a Marine Environment

49
FLBenchmark-toolkit
FLBenchmark-toolkit AI-secure Python

Federated Learning Framework Benchmark (UniFed)

49
swt-bench
swt-bench logic-star-ai Python

[NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation

49
DataGen
DataGen HowieHwong Python

[ICLR'25] DataGen: Unified Synthetic Dataset Generation via Large Language Models

49
fizzboom
fizzboom darklang F#

Benchmark to compare async web server + interpreter + web client implementations across various languages

48
rust-zero-cost-abstractions
rust-zero-cost-abstractions mike-barber C#

Testing out a Zero Cost Abstraction in Rust compared to similar approaches in C# and Java

48
weather4cast
weather4cast iarai Jupyter Notebook

Code accompanying our IARAI Weather4cast Challenge

48
heatwave-tpch
heatwave-tpch oracle

SQL scripts for HeatWave benchmarking

48
Slime-Mould-Algorithm-A-New-Method-for-Stochastic-Optimization-
Slime-Mould-Algorithm-A-New-Method-for-Stochastic-Optimization- aliasgharheidaricom MATLAB

In this paper, a new stochastic optimizer, which is called slime mould algorithm (SMA), is proposed based upon the oscillation mode of slime mould in...

48
SeasonDepth
SeasonDepth SeasonDepth Python

This package provides a python toolkit for the evaluation on the "SeasonDepth: Cross-Season Monocular Depth Prediction Dataset and Benchmark under Mul...

48
optimization-demo
optimization-demo szabolcsdombi C++

:zap: Optimizing Python code by implementing a C++ extension

48
vbr-devkit
vbr-devkit rvp-group Python

Vision Benchmark in Rome Development Kit

48
MedIAnomaly
MedIAnomaly caiyu6666 Python

[MedIA 2025] MedIAnomaly: A comparative study of anomaly detection in medical images

48
MCC5-THU-Gearbox-Benchmark-Datasets
MCC5-THU-Gearbox-Benchmark-Datasets liuzy0708 MATLAB

A benchmark fault diagnosis dataset comprises vibration data collected from a gearbox under variable working conditions with intentionally induced fau...

48
o1_medical
o1_medical UCSC-VLAA Python
48
WiMANS
WiMANS huangshk Python

[ECCV 2024] WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing

48
RCAEval
RCAEval phamquiluan Python

[ASE'24][WWW'25] RCAEval: A Benchmark for Root Cause Analysis. https://doi.org/10.1145/3691620.3695065

48
syntherela
syntherela martinjurkovic Python

A package for benchmarking synthetic relational data generation methods

48
react-native-css-in-js-benchmarks
react-native-css-in-js-benchmarks brunolemos JavaScript

CSS in JS Benchmarks for React Native

47
scRNAseq_cell_cluster_labeling
scRNAseq_cell_cluster_labeling jdime Perl

Scripts to run and benchmark scRNA-seq cell cluster labeling methods

47
spatial_index_benchmark
spatial_index_benchmark mloskot C++

Simple non-academic performance comparison of available open source implementations of R-tree spatial index using linear, quadratic and R* balancing a...

47
hyperspectral-soilmoisture-dataset
hyperspectral-soilmoisture-dataset felixriese Jupyter Notebook

Hyperspectral and soil-moisture data from a field campaign based on a soil sample. Karlsruhe (Germany), 2017.

47
benchmarkify
benchmarkify icebob JavaScript

:zap: Benchmark framework for NodeJS

47
Unity-Pathfinding-Jobs-StressTest
Unity-Pathfinding-Jobs-StressTest CristianQiu C#

Unity project showcasing A* pathfinding, fully jobified & burst compiled. It also contains examples of RaycastCommand and BoxcastCommand that are used...

47
PRUDEX-Compass
PRUDEX-Compass TradeMaster-NTU Python

Official implementation of PRUDEX-Compass

47
python-package-manager-shootout
python-package-manager-shootout lincolnloop Makefile

Benchmarking various Python package managers

47
php-benchmark-script
php-benchmark-script sergix44 PHP

A simple PHP script that helps you compare raw performance across servers and php versions

47
SWE-bench-Live
SWE-bench-Live microsoft Python

🚀 SWE-bench Goes Live!

47
bots
bots bsc-pm C

Barcelona OpenMP Task Suite is a collection of applications that allow to test OpenMP tasking implementations and compare its behaviour under certain...

46
arb
arb TheDuckAI TypeScript

Advanced Reasoning Benchmark Dataset for LLMs

46
ToMBench
ToMBench zhchen18 Python

ToMBench: Benchmarking Theory of Mind in Large Language Models, ACL 2024.

46
ReForm-Eval
ReForm-Eval FudanDISC Python

An benchmark for evaluating the capabilities of large vision-language models (LVLMs)

46
wordpress-speedtest
wordpress-speedtest szepeviktor PHP

VPS Speedtest for WordPress with 160 results: 🏆 UpCloud (raw memory and CPU benchmark)

45
cloud-workbench
cloud-workbench sealuzh Ruby

Cloud WorkBench (CWB) is a web-based framework that is grounded on the notion of Infrastructure-as-Code (IaC) to foster simple definition, execution,...

45
SQL-ProcBench
SQL-ProcBench microsoft TSQL

SQL-ProcBench is an open benchmark for procedural workloads in RDBMSs.

45
KcBERT-Finetune
KcBERT-Finetune Beomi Python

KcBERT/KcELECTRA Fine Tune Benchmarks code (forked from https://github.com/monologg/KoELECTRA/tree/master/finetune)

45
Dataset
Dataset WebFuzzing Java

Web Fuzzing Dataset (WFD): a set of web/enterprise applications for experimentation in automated system testing

45
Frame-Time-Analysis
Frame-Time-Analysis BoringBoredom TypeScript

web application that charts and compares multiple frame time logs at the same time. Compatible with FPS benchmarking programs such as PresentMon, OCAT...

45
spiko
spiko trinhminhtriet Rust

🚀 Spiko is a fast, Rust-based load testing tool with a beautiful TUI for real-time insights.

45
multimodal-needle-in-a-haystack
multimodal-needle-in-a-haystack Wang-ML-Lab Python

[NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models

45
RediSearchBenchmark
RediSearchBenchmark RediSearch Go

Benchmarks for the RediSearch module

44
node-red-contrib-actionflows
node-red-contrib-actionflows Steveorevo JavaScript

Provides a set of nodes to enable an extendable design pattern for flows.

44
dammmdatagen
dammmdatagen DoktorMike R

Marketing Mix Modeling Data Generator

44
mdl-stance-robustness
mdl-stance-robustness UKPLab Python

Multi-dataset stance detection and robustness experiments

44
weblink
weblink PXshadow Haxe
44