The ability of zero-shot translation emerges when we train a multilingual model with certain translation directions; the model can then directly translate in unseen directions. Alternatively, zero-shot translation can be accomplished by pivoting through a third language (e.g., English). In our work, we observe that both direct and pivot translations are noisy and achieve less satisfactory performance. We propose EBBS, an ensemble method with a novel bi-level beam search algorithm, where each ensemble component explores its own prediction step by step at the lower level but they are synchronized by a “soft voting” mechanism at the upper level. Results on two popular multilingual translation datasets show that EBBS consistently outperforms direct and pivot translations as well as existing ensemble techniques. Further, we can distill the ensemble’s knowledge back to the multilingual model to improve inference efficiency; profoundly, our EBBS-based distillation does not sacrifice, or even improves, the translation quality.
Bibtex
@misc{wen2024ebbsensemblebilevelbeam,
title={EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation},
author={Yuqiao Wen and Behzad Shayegh and Chenyang Huang and Yanshuai Cao and Lili Mou},
year={2024},
eprint={2403.00144},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2403.00144},
}
Related Research
-
Training foundation models up to 10x more efficiently with Memory-Mapped Datasets
Training foundation models up to 10x more efficiently with Memory-Mapped Datasets
T. Badamdorj, and M. Anand.
Research
-
DeepRRTime: Robust Time-series Forecasting with a Regularized INR Basis
DeepRRTime: Robust Time-series Forecasting with a Regularized INR Basis
C.S. Sastry, M. Gilany, K. Y. C. Lui, M. Magill, and A. Pashevich. Transactions on Machine Learning Research (TMLR)
Publications
-
Radar: Fast Long-Context Decoding for Any Transformer
Radar: Fast Long-Context Decoding for Any Transformer
Y. Hao, M. Zhai, H. Hajimirsadeghi, S. Hosseini, and F. Tung. International Conference on Learning Representations (ICLR)
Publications