Abstract: In order to improve the generation method in vision-grounded language model ViMac, a core-based visual semantic representation is proposed. With core-based semantic representation, ViMac can ...