ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models

Summary

This is a publication. If there is no link to the publication on this page, you can try the pre-formated search via the search engines listed on this page.

Authors: Ilker Kesen, Andrea Pedrotti, Mustafa Dogan, Michele Cafagna, Emre Can Acikgoz, Letitia Parcalabescu, Iacer Calixto, Anette Frank, Albert Gatt, Aykut Erdem, Erkut Erdem

Journal title: Proceedings 12th International Conference on Learning Representations (ICLR2024)

Journal publisher: OpenReview.net

Published year: 2024