Abstract: Conventional object detection models require large amounts of training data. In comparison, humans can recognize previously unseen objects by merely knowing their semantic description. To ...
Abstract: Open-vocabulary object detection (OVD) aims to detect novel object concepts by mining region-word correspondences from image-text pairs, yet current methods often produce false ...