面向中文大模型价值观的评估与对齐研究
FewCLUE 小样本学习测评基准,中文版
SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.
A global network of probes to run network tests like ping, traceroute and DNS resolve
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA
Z-Bench 1.0 by 真格基金:一个麻瓜的大语言模型中文测试集。Z-Bench is a LLM prompt dataset for non-technical users, developed by an enthusiastic AI-focu...
The PatchCamelyon (PCam) deep learning classification benchmark.
Visual Object Tracking
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Comp...
Knowledge Graph Generation from Any Text
glmark2 is an OpenGL 2.0 and ES 2.0 benchmark
Automated CIS Benchmark Compliance Remediation for RHEL 7 with Ansible
JavaScript benchmark for common web developer workloads
Current state of supervised and unsupervised depth completion methods
飞桨大模型开发套件,提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。
Image classification with NVIDIA TensorRT from TensorFlow models.
Benchmark the performances of various Swift layout frameworks (autolayout, UIStackView, PinLayout, LayoutKit, FlexLayout, Yoga, ...)
🔥 Stupid Simple CPU/MEM "Profiler" for your JS code.
PHP Profiler & Developer Toolbar (built for Phalcon)
Benchmarks of JavaScript Package Managers
OpenML AutoML Benchmarking Framework
Chinese Biomedical Language Understanding Evaluation benchmark (ChineseBLUE)
A universal flight control tuning framework
This is a simple App to test some blur algorithms on their visual quality and performance.
PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
SB(SRS Bench) is a set of benchmark and regression test tools, for SRS and other media servers, supports HTTP-FLV, RTMP, HLS, WebRTC and GB28181.
xAST评价体系,让安全工具不再“黑盒”. The xAST evaluation benchmark makes security tools no longer a "black box".
Database Benchmarking Framework
An extensive evaluation and comparison of 28 state-of-the-art superpixel algorithms on 5 datasets.
Remove unwanted files and directories from your node_modules folder
A collection of MARL benchmarks based on TorchRL
A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)
FedScale is a scalable and extensible open-source federated learning (FL) platform.
An agent benchmark with tasks in a simulated software company.
Continuous Benchmark for Go Project
Swift benchmark runner with many performance metrics and great CI support
Jetson Benchmark
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
Face landmarks(fiducial points) detection benchmark
The First Dynamic Map Removal Benchmark | Included 8 SOTA methods | Continous updating
⚡An Easy-to-Use and Optimized compression library for .NET that unified several compression algorithms including LZ4, Snappy, Zstd, LZMA, Brotli, GZi...
DANCE: a deep learning library and benchmark platform for single-cell analysis
SKAB - Skoltech Anomaly Benchmark. Time-series data for evaluating Anomaly Detection algorithms.
[ICCV 2023] MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
RGB-D Salient Object Detection: A Survey
Puck is a high-performance ANN search engine
Generate benchmarks for terminal emulators
Are We Fast Yet? Comparing Language Implementations with Objects, Closures, and Arrays