1

X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale

Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Streaming Sequence Transduction through Dynamic Compression

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models

Condensing Multilingual Knowledge with Lightweight Language-Specific Modules

Efficiently Harnessing Parameter Importance for Better Training

Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity

Language-Aware Multilingual Machine Translation with Self-Supervised Learning