Benchmark that evaluates LLMs using 651 NYT Connections puzzles extended with extra trick words
[NeurlPS 2023] A Dataset and Benchmark for Pose-agnostic Anomaly Detection.
We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that the OOD accur...
BIRL: Benchmark on Image Registration methods with Landmark validations
Latency Benchmarking tool
[WACV 2023] Information and scripts for the CropAndWeed Dataset
A tool for benchmarking usage of Vault.
This is an open-source tool to assess and improve the trustworthiness of AI systems.
Python module for CEC 2017 single objective optimization test function suite.
Reexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks
A benchmark utility for POSIX shell comparison
Fastest Trie structure (Linux & Windows)
A Karma plugin to run Benchmark.js over multiple browsers with CI compatible output.
This is a benckmark for domain generalization-based fault diagnosis (基于领域泛化的相关代码)
UME::SIMD A library for explicit simd vectorization.
How good are LLMs at chemistry?
It is a collection of php benchmarks
DTB70 -- A Drone Tracking Benchmark
Serverreview Benchmark Script v3
RNN benchmarks of pytorch, tensorflow and theano
Comparison and benchmark of JavaScript serialization libraries (Protocol Buffer, Avro, BSON, etc.)
Benchmarking framework for index structures on persistent memory
Microbenchmarks comparing the Julia Programming language with other languages
High fidelity benchmark runner
A selection of ANSI C benchmarks and programs useful as benchmarks
Low-level dotnet network benchmark for UDP socket performance (.NET and Unity compatible)
[IJCAI 2024] FactCHD: Benchmarking Fact-Conflicting Hallucination Detection
Code and models for "Pano3D: A Holistic Benchmark and a Solid Baseline for 360 Depth Estimation", OmniCV Workshop @ CVPR21.
🏞️ [IEEE ICRA2023] The official repository for paper "Wild-Places: A Large-Scale Dataset for Lidar Place Recognition in Unstructured Natural Environm...
The Benchmark⏲ module provides methods to measure and report the time used to execute Swift code.
🔥🔥🔥 Latest Advances on Large Recommendation Models
(NeurIPS 2024) Official PyTorch implementation of LOVA3
Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platform
Provide full reinforcement learning benchmark on mujoco environments, including ddpg, sac, td3, pg, a2c, ppo, library
[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Comparison of OpenGL and Vulkan API in terms of performance.
Turkish LM Tuner
Benchmark different versions of same or similar gems & Static Gemfile and installed gem library source code analysis
HTTP Load Generator
Check your internet speed/bandwidth right from your terminal. Built on Golang using chromedp
ncnn android benchmark app
A thread-safe fixed-size circular buffer written in safe Rust.
Mapping of the SimpleQuestions dataset to Wikidata
TUM Traffic Dataset Development Kit
Locust4j is a load generator for locust, written in Java.
A tool for examining GPU scheduling behavior.
An elegant RTSS Overlay to showcase your benchmark stats in style.
Generate performance reports from your django database performance tests.
Libsodium WebAssembly benchmarks results.