Learning high-quality representations for data from different modalities but with a shared underlying meaning has been a critical building block for information retrieval. Moreover, hard negative mining has shown to be effective in forcing models to learn discriminative features.
In this paper, we present a new technique for hard negative mining for learning visual-semantic embeddings for cross-modal retrieval. We focus on selecting hard negative pairs that are sampled by an adversarial generator.
In settings with attention, our adversarial generator composes harder negatives through novel combinations of image regions across different images for a given caption. We find scores across the board for all R@K-based metrics, but this technique is also significantly more sample efficient and leads to faster convergence in fewer iterations.
Related Research
-
Interpretation for Variational Autoencoder Used to Generate Financial Synthetic Tabular Data
Interpretation for Variational Autoencoder Used to Generate Financial Synthetic Tabular Data
J. Wu, K. N. Plataniotis, *L. Z. Liu, *E. Amjadian, and Y. A. Lawryshyn. Special Issue Interpretability, Accountability and Robustness in Machine Learning (Algorithims)
Publications
-
ATOM: Attention Mixer for Efficient Dataset Distillation
ATOM: Attention Mixer for Efficient Dataset Distillation
*S. Khaki, *A. Sajedi, K. Wang, L. Z. Liu, Y. A. Lawryshyn, and K. N. Plataniotis. Oral presentation at The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR)
Publications
-
DataDAM: Efficient Dataset Distillation with Attention Matching
DataDAM: Efficient Dataset Distillation with Attention Matching
*A. Sajedi, *S. Khaki, E. Amjadian, L. Z. Liu, Y. A. Lawryshyn, and K. N. Plataniotis. International Conference in Computer Vision (ICCV)
Publications