The ability of zero-shot translation emerges when we train a multilingual model with certain translation directions; the model can then directly translate in unseen directions. Alternatively, zero-shot translation can be accomplished by pivoting through a third language (e.g., English). In our work, we observe that both direct and pivot translations are noisy and achieve less satisfactory performance. We propose EBBS, an ensemble method with a novel bi-level beam search algorithm, where each ensemble component explores its own prediction step by step at the lower level but they are synchronized by a “soft voting” mechanism at the upper level. Results on two popular multilingual translation datasets show that EBBS consistently outperforms direct and pivot translations as well as existing ensemble techniques. Further, we can distill the ensemble’s knowledge back to the multilingual model to improve inference efficiency; profoundly, our EBBS-based distillation does not sacrifice, or even improves, the translation quality.
Bibtex
@misc{wen2024ebbs,
title={EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation},
author={Yuqiao Wen and Behzad Shayegh and Chenyang Huang and Yanshuai Cao and Lili Mou},
year={2024},
eprint={2403.00144},
archivePrefix={arXiv},
primaryClass={id=’cs.CL’ full_name=’Computation and Language’ is_active=True alt_name=’cmp-lg’ in_archive=’cs’ is_general=False description=’Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.’}
}
Related Research
-
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
Y. Hao, Y. Cao, and L. Mou. Workshop at Conference on Neural Information Processing Systems (NeurIPS)
Publications
-
ClavaDDPM: Multi-relational Data Synthesis with Cluster-guided Diffusion Models
ClavaDDPM: Multi-relational Data Synthesis with Cluster-guided Diffusion Models
W. Pang, M. Shafieinejad, L. Liu, S. Hazlewood, and X. He. Conference on Neural Information Processing Systems (NeurIPS)
Publications
-
Bayesian Neural Networks
Bayesian Neural Networks
S. Prince.
Research