Recent video-language models' (VidLMs) performance on various video-language tasks has been outstanding. Such multimodal models only come with drawbacks. For example, it is shown that vision-language models have difficulty understanding compositional and order relations in images,...
Language gives humans an extraordinary level of general intellect and sets them apart from all other creatures. Importantly, language not only helps people interact with others better, but it also improves our capacity to think. Before...

Recent articles

