Memory and Bandwidth are All You Need for Fully Sharded Data Parallel

Summary

This is a publication. If there is no link to the publication on this page, you can try the pre-formated search via the search engines listed on this page.

Authors: Jiangtao Wang, Jan Ebert, Oleg Filatov, Stefan Kesselheim

Journal title: ICML'24 Workshop on Advancing Neural Network Training (WANT)

Journal publisher: arXiV

Published year: 2025

DOI identifier: 10.48550/ARXIV.2504.03655