Tokenizer Choice For LLM Training: Negligible or Crucial?

Summary

This is a publication. If there is no link to the publication on this page, you can try the pre-formated search via the search engines listed on this page.

Authors: Mehdi Ali, Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Buschhoff, Charvi Jain, Alexander Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte

Journal title: Findings of the Association for Computational Linguistics: NAACL 2024

Journal publisher: Association for Computational Linguistics

Published year: 2024

DOI identifier: 10.18653/V1/2024.FINDINGS-NAACL.247