Pretraining Data Mixtures for Transformer Models - arxiv.org

Clear