LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
[ACL 2025 🔥] A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding
[NAACL 2025 🔥] CAMEL-Bench is an Arabic benchmark for evaluating multimodal models across eight domains with 29,000 questions.