The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models,...
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具...
Kotlin multiplatform benchmarking toolkit
This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]
[ICML 2024] TrustLLM: Trustworthiness in Large Language Models
Performance testing matchers for RSpec
Federated Learning Benchmark - Federated Learning on Non-IID Data Silos: An Experimental Study (ICDE 2022)
BenchMARL is a library for benchmarking Multi-Agent Reinforcement Learning (MARL). BenchMARL allows to quickly compare different MARL algorithms, task...
A Benchmark of Text Classification in PyTorch
Reduce CPU usage by non-blocking async loop and psychologically speed up in JavaScript
📖 Korean NLU Benchmark
A more modern http framework benchmarker supporting HTTP/1 and HTTP/2 benchmarks.
[CVPR 2020] A Large-Scale Dataset for Real-World Face Forgery Detection
🔥Urban-scale point cloud dataset (CVPR 2021 & IJCV 2022)
AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reproducibility...
Visual Object Tracking
Serialization library written in C++17 - Pack C++ structs into a compact byte-array without any macros or boilerplate code
面向中文大模型价值观的评估与对齐研究
Naive performance comparison of a few programming languages (JavaScript, Kotlin, Rust, Swift, Nim, Python, Go, Haskell, D, C++, Java, C#, Object Pasca...
SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
Performance benchmarking and testing framework for .NET applications :chart_with_upwards_trend:
java rpc benchmark, 灵感源自 https://www.techempower.com/benchmarks/
ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial rel...
The PatchCamelyon (PCam) deep learning classification benchmark.
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA
FewCLUE 小样本学习测评基准,中文版
CL-bench: A Benchmark for Context Learning
glmark2 is an OpenGL 2.0 and ES 2.0 benchmark
A global network of probes to run network tests like ping, traceroute and DNS resolve
Current state of supervised and unsupervised depth completion methods
The official GitHub repository of the paper "Recent advances in large language model benchmarks against data contamination: From static to dynamic eva...
Z-Bench 1.0 by 真格基金:一个麻瓜的大语言模型中文测试集。Z-Bench is a LLM prompt dataset for non-technical users, developed by an enthusiastic AI-focu...
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI
JavaScript benchmark for common web developer workloads
Automated CIS Benchmark Compliance Remediation for RHEL 7 with Ansible
Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike static benchmarks...
飞桨大模型开发套件,提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。
A modern iperf3 alternative with a live TUI, multi-client server, and QUIC support. Built in Rust.
xAST评价体系,让安全工具不再“黑盒”. The xAST evaluation benchmark makes security tools no longer a "black box".
A coding agent optimized to smaller LLMs
Image classification with NVIDIA TensorRT from TensorFlow models.
OpenML AutoML Benchmarking Framework
A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)
VPS测试脚本 | VPS性能测试(VPS基本信息、IO性能、全球测速、ping、回程路由测试)、BBR加速脚本(一种加速TCP的拥堵算法技术)、三网测速脚本(三网测速、流媒...
Benchmark the performances of various Swift layout frameworks (autolayout, UIStackView, PinLayout, LayoutKit, FlexLayout, Yoga, ...)
PHP Profiler & Developer Toolbar (built for Phalcon)
🔥 Stupid Simple CPU/MEM "Profiler" for your JS code.
A universal flight control tuning framework
ParseBench - A Document Parsing Benchmark for AI Agents