Topic

benchmark

Repositories (1763)

ClickBench
ClickBench ClickHouse HTML

ClickBench: a Benchmark For Analytical Databases

990
Monocular-Depth-Estimation-Toolbox
Monocular-Depth-Estimation-Toolbox zhyever Python

Monocular Depth Estimation Toolbox based on MMSegmentation.

967
moses
moses molecularsets Python

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

966
AICGSecEval
AICGSecEval Tencent Python

A.S.E (AICGSecEval) is a repository-level AI-generated code security evaluation benchmark developed by Tencent Wukong Code Security Team.

963
KernelBench
KernelBench ScalingIntelligence Jupyter Notebook

KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)

946
grpc_bench
grpc_bench LesnyRumcajs Dockerfile

Various gRPC benchmarks

935
blazehttp
blazehttp chaitin Go

BlazeHTTP 是一款简单易用的 WAF 防护效果测试工具。BlazeHTTP stands as a user-friendly WAF protection efficacy evaluation tool.

934
opencv_zoo
opencv_zoo opencv Python

Model Zoo For OpenCV DNN and Benchmarks.

934
agoo
agoo ohler55 C

A High Performance HTTP Server for Ruby

927
nench
nench n-st Shell

VPS benchmark script — based on the popular bench.sh, plus CPU and ioping tests, and dual-stack IPv4 and v6 speedtests by default

914
s3-benchmark
s3-benchmark dvassallo Go

Measure Amazon S3's performance from any location.

910
AoE
AoE didi C++

AoE (AI on Edge,终端智能,边缘计算) 是一个终端侧AI集成运行时环境 (IRE),帮助开发者提升效率。

887
IocPerformance
IocPerformance danielpalme C#

Performance comparison of .NET IoC containers

885
mimic3-benchmarks
mimic3-benchmarks YerevaNN Python

Python suite to construct benchmark machine learning datasets from the MIMIC-III 💊 clinical database.

881
InferenceX
InferenceX SemiAnalysisAI Python

Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Train...

876
rl4co
rl4co ai4co Python

A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO)

865
Celero
Celero DigitalInBlue C++

C++ Benchmark Authoring Library/Framework

860
nvbench
nvbench NVIDIA Cuda

CUDA Kernel Benchmarking Library

856
CBLUE
CBLUE CBLUEbenchmark Python

[CBLUE1] 中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark

841
CrossPlatformDiskTest
CrossPlatformDiskTest maxim-saplin C#

Windows, macOS and Android storage (HDD, SSD, RAM) speed testing/performance benchmarking app

836
human-learn
human-learn koaning Jupyter Notebook

Natural Intelligence is still a pretty good idea.

832
huststore
huststore Qihoo360 C

High-performance Distributed Storage

830
bencher
bencher bencherdev MDX

🐰 Bencher - Continuous Benchmarking

826
WeatherBench
WeatherBench pangeo-data Jupyter Notebook

A benchmark dataset for data-driven weather forecasting

824
typescript-runtime-type-benchmarks
typescript-runtime-type-benchmarks moltar TypeScript

📊 Benchmark Comparison of Packages with Runtime Validation and TypeScript Support

818
meta-dataset
meta-dataset google-research Jupyter Notebook

A dataset of datasets for learning to learn from few examples

801
sbt-jmh
sbt-jmh sbt Scala

"Trust no one, bench everything." - sbt plugin for JMH (Java Microbenchmark Harness)

797
Programming-Language-Benchmarks
Programming-Language-Benchmarks hanabi1224 C#

Yet another implementation of computer language benchmarks game

794
http_bench
http_bench linkxzhou Go

golang HTTP stress testing tool, support single and distributed, http/1, http/2 and http/3.

792
ISC-Bench
ISC-Bench wuyoscar Python

Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.

783
r3f-perf
r3f-perf utsuboco TypeScript

Easily monitor your ThreeJS performances.

772
warp
warp minio Go

S3 benchmarking tool

772
robustbench
robustbench RobustBench Python

RobustBench: a standardized adversarial robustness benchmark [NeurIPS 2021 Benchmarks and Datasets Track]

772
HammerDB
HammerDB TPC-Council Tcl

HammerDB: The industry standard open-source database benchmark

752
caffenet-benchmark
caffenet-benchmark ducha-aiki Jupyter Notebook

Evaluation of the CNN design choices performance on ImageNet-2012.

743
OpenCUA
OpenCUA xlang-ai Python

OpenCUA: Open Foundations for Computer-Use Agents

740
tape
tape songlab-cal Python

Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of prote...

738
py-frameworks-bench
py-frameworks-bench klen Python

Another benchmark for some python frameworks

722
microservices-framework-benchmark
microservices-framework-benchmark networknt C++

Raw benchmarks on throughput, latency and transfer of Hello World on popular microservices frameworks

719
LightCompress
LightCompress ModelTC Python

[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.

711
ffi-overhead
ffi-overhead dyu C

comparing the c ffi (foreign function interface) overhead on various programming languages

696
caliper
caliper hyperledger-caliper JavaScript

A blockchain benchmark framework to measure performance of multiple blockchain solutions https://wiki.hyperledger.org/display/caliper

694
deep_research_bench
deep_research_bench Ayanami0730 Python

DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

694
TheAgentCompany
TheAgentCompany TheAgentCompany Python

An agent benchmark with tasks in a simulated software company.

690
PointTinyBenchmark
PointTinyBenchmark ucas-vg Python

Point based and tiny object detection and localization code set of UCAS-VG

688
long-form-factuality
long-form-factuality google-deepmind Python

Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".

684
BenchmarkTools.jl
BenchmarkTools.jl JuliaCI Julia

A benchmarking framework for the Julia language

668
AI_Diplomacy
AI_Diplomacy GoodStartLabs Python

Frontier Models playing the board game Diplomacy.

656
openmixup
openmixup Westlake-AI Python

CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark

656
datasets
datasets benedekrozemberczki

A repository of pretty cool datasets that I collected for network science and machine learning research.

652