Public Goods Game (PGG) Benchmark: Contribute & Punish is a multi-agent benchmark that tests cooperative and self-interested strategies among Large La...
Benchmark nodeJS worker threads for calculating prime numbers, using various dataStructures
Tiny Image in Javascript - Edge Detection Algorithms
Comparison of C++ Serialization Libraries for Graph Data
Masked face recognition focuses on identifying people using their facial features while they are wearing masks. We introduce benchmarks on face verifi...
Ray-triangle intersection performance tests in various languages
Repository for the paper "ViHOS: Vietnamese Hate and Offensive Spans Detection" (EACL2023)
Swift port of HdrHistogram
[NeurIPS'24] Official implementation of paper "Unveiling the Tapestry of Consistency in Large Vision-Language Models".
Get 3D motion vectors / scene flow directly from Blender
GenoArmory: A Unified Evaluation Framework for Adversarial Attacks on Genomic Foundation Models
This framework provides logging, benchmarking and monitoring.
Software 3D renderer & rasteriser written in WASM/C & TypeScript to test / showcase WebAssembly and compare performance
goku is a HTTP load testing application written in Rust
🧀 The Benchmark Testing Box
Instant search for and access to many datasets in Pyspark.
A simple benchmark comparing Lua performance to Vimscript (because no one seems to care about these nowadays)
[EMNLP 2022 Findings] Towards Realistic Low-resource Relation Extraction: A Benchmark with Empirical Baseline Study
Dataset, metrics, and models for TACL 2023 paper MACSUM: Controllable Summarization with Mixed Attributes.
This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"
A Micro-benchmarking Framework for Python Type Inference Tools
A comprehensive local Linux Privilege-Escalation Benchmark
A Multiplatform benchmark designed to provide holistic, detailed and close-to-hardware view of memory system performance with family of bandwidth--lat...
A collection of std-like containers written in C++11. Features fast unordered flat map/set, configurable double-ended vector and sparse deque.
The Redis benchmarks specification describes the cross-language/tools requirements and expectations to foster performance and observability standards...
DafnyBench: A Benchmark for Formal Software Verification
Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultr...
[VLDB 2023] Model Selection for Anomaly Detection in Time Series
The Zebrafish Activity Prediction Benchmark measures progress on the problem of predicting cellular-resolution neural activity throughout an entire ve...
☁️ Benchmarking LLMs for Cloud Config Generation | 云场景下的大模型基准测试
CIS settings bootstrapper for Mac
Pleasures for Web in Golang
Collection of Suffix Array Construction Algorithms (SACAs)
Quick and easy resource usage monitoring and benchmarking for any command's CPU, memory, disk usage and runtime.
The official python toolkit for running experiments and evaluate performance on VideoCube benchmark @TPAMI2023
Generate markdown comparison tables from `cargo-criterion` JSON output
A collection of RISC-V Vector (RVV) benchmarks to help developers write portably performant RVV code. (Results)
I/O benchmark for different image processing python libraries.
BeHonest: Benchmarking Honesty in Large Language Models
Critical difference diagrams with Python and Tikz
Lua <-> C++ bindings libraries benchmark
:hammer: :wrench: Test Driven Development :repeat: with Golang :hamster:
The Stanford Word Substitution (Swords) Benchmark
ansible-vault CLI reimplemented in go
An unified framework of quality enhancement approaches for compressed images based on PyTorch.
[IROS2021] NYU-VPR: Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymization Influences
Heterogenous, Task- and Domain-Specific Benchmark for Unsupervised Sentence Embeddings used in the TSDAE paper: https://arxiv.org/abs/2104.06979.
A Comprehensive and Versatile Open-Source Federated Learning Framework
[NeurIPS DBT 2021] HPO-B
LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each other or to 50 i...