VQA

Head-Aware Visual Cropping: Enhancing Fine-Grained VQA with Attention-Guided Subimage

Multimodal Large Language Models (MLLMs) show strong performance in Visual Question Answering (VQA) but remain limited in fine-grained …

Junfei Xie, Peng Pan, Xulong Zhang

Head-Aware Visual Cropping: Enhancing Fine-Grained VQA with Attention-Guided Subimage

Mita: A Hierarchical Multi-Agent Collaboration Framework with Memory-Integrated and Task Allocation

Recent advances in large language models (LLMs) have substantially accelerated the development of embodied agents. LLM-based …

Xiaojie Zhang, Jianhan Wu, Xiaoyang Qu, Jianzong Wang

Mita: A Hierarchical Multi-Agent Collaboration Framework with Memory-Integrated and Task Allocation