This repo contains the codes of the penetration test benchmark for Generative Agents presented in the paper "AutoPenBench: Benchmarking Generative Age...
Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.
Longitudinal Evaluation of LLMs via Data Compression
A benchmark for standalone WebAssembly
Code and data for "Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?" (ACL 2024)
C# ECS Benchmarks
Spurious Features Everywhere - Large-Scale Detection of Harmful Spurious Features in ImageNet
🔥 Collection of useful javascript snippets with automated benchmarks
R package for benchmarking single cell analysis methods
The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"
Arline Benchmarks platform allows to benchmark various algorithms for quantum circuit mapping/compression against each other on a list of predefined h...
Just a small test to see which language is better for extending python when using lists of lists
Linköping GraphQL Benchmark (LinGBM)
Benchee (Elixir benchmarking) integration for Livebook
A modular benchmarking library with V8 warmup and cpu/ram denoising for the most accurate and consistent results.
[NAACL 2025 🔥] CAMEL-Bench is an Arabic benchmark for evaluating multimodal models across eight domains with 29,000 questions.
[ACL 2024 Main] NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese Journalism
phpbenchmarks.com kit to add your benchmark.
corebench - run your benchmarks against high performance computing servers with many CPU cores
A C++ Thread Pool Colosseum
A benchmark library for Dynamic Algorithm Configuration.
Benchmarking programming languages and web frameworks.
PHP User Agent Parser Benchmarks
The code accompaniment for the CoRL 2020 paper: A User's Guide to Calibrating Robotics Simulators (https://arxiv.org/abs/2011.08985), from NVIDIA Rese...
📚Examples
SERAB: a multi-lingual benchmark for speech emotion recognition
Benchmarks comparing ESNext features to their ES5 and various pre-processor equivalents
a benchmark for compile-time and/or runtime Nim 🏆
Micro-Runner, a CLI playground for benchmarking your JavaScript code
Jakarta EE 5/6/7/8/10 WildFly/JBoss EAP Clustering Benchmark Application
JavaScript test-runners benchmark
A benchmark program for dgraph.
Publication of our Oktoberfest Food Dataset for Object Detection methods
Performance benchmarks of Python, Numpy, etc. vs. other languages such as Matlab, Julia, Fortran.
GKE CIS 1.1.0 Benchmark InSpec Profile
The Shopware 6 performance benchmarking toolset, built by Shopware and Tideways.
MongoDB/PostgreSQL JSON benchmark tool (and slides) for Percona EU 2017
Node.js benchmark runner
Simple benchmarks in both node and browser
Performance comparison of existing GAN based Text To Image algorithms. (GAN-CLS, StackGAN, TAC-GAN)
Extensible stream pipelines with object algebras.
Application for HTTP benchmarking via different rules and configs
The app implements and benchmarks different Core Data persistence options. It supplements the blog post http://www.vadimbulavin.com/how-to-save-images...
benchmark for [apache/dubbo-go](github.com/apache/dubbo-go)
Benchmarks across Deep Learning Frameworks in Julia and Python
Tool to easily benchmark QML/QtQuick (or your own QML components) performance on different hardware.
RGB-D Scribble-based Segmentation Benchmark
Benchmark a WebSocket server's message throughput ⌛