A version of verl to support diverse tool use
This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]
Code and data for "MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning" [ICLR 2024]
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024 Best Paper]
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]
This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]
The official repo for "TheoremQA: A Theorem-driven Question Answering dataset" (EMNLP 2023)
"TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks" [TMLR 2024]
SWE-QA-Pro: A Representative Benchmark and Scalable Training Recipe for Repository-Level Code Understanding [ACL 2026]
SWE-Next: Scalable Real-World Software Engineering Tasks for Agents