Towards Pareto Optimal Throughput in Small Language Model Serving

Summary

This is a publication. If there is no link to the publication on this page, you can try the pre-formated search via the search engines listed on this page.

Authors: Pol G. Recasens, Yue Zhu, Chen Wang, Eun Kyung Lee, Olivier Tardieu, Alaa Youssef, Jordi Torres, Josep Ll. Berral

Journal title: Proceedings of the 4th Workshop on Machine Learning and Systems

Journal publisher: ACM

Published year: 2024

DOI identifier: 10.1145/3642970.3655832